The first integration is never the problem
A new partner sends data as a CSV. Your engineering team writes a script to parse it, map the fields, load it into your system. It takes a week, maybe two. Everyone moves on.
Then the second partner arrives with XML. The third sends JSON with a completely different schema. The fourth sends PDFs. By partner number five, you have five separate scripts with inconsistent assumptions about field names, date formats, null handling, and error behavior. Nobody remembers why the third script has that weird workaround on line 47.
This is the trajectory of a custom-built integration practice. The initial build is fast and cheap. The long-term cost is where it gets painful.
What “building it ourselves” actually means
The decision to build custom integrations is almost always framed around the first project. “We know our data model, we know the source format, we will just write a parser.” That reasoning is sound for a single, stable integration with a partner whose data never changes.
Here is what that decision does not account for.
Format variation within a single partner. A partner sends files with slightly different column headers depending on which system exported them. One month the date column is “Date,” the next it is “Transaction_Date,” the next it is “dt.” Your script handles the first two because you hardcoded both. The third breaks silently, loading dates into the wrong field.
Error handling at scale. A custom script either fails completely or succeeds completely. What happens when row 4,500 out of 50,000 has a malformed phone number? Do you reject the entire file? Skip the row? Log it somewhere? Those behaviors have to be built, tested, and maintained in the script, and they get reimplemented in the next integration with subtle differences.
Schema drift over time. Partners change their systems. Fields get renamed, added, removed. A column that used to contain integers now contains strings with currency symbols. Your script was not designed for this because at the time it was written, it did not need to be.
Monitoring and alerting. When a scheduled integration fails at 3 AM, who knows? Custom scripts rarely have sophisticated monitoring. The failure surfaces hours or days later when someone notices missing data downstream.
The hidden cost of in-house integration
Looking at initial build time alone undercounts the real cost. The full cost has seven components that compound over the life of the integration.
| Cost component | What it looks like | Typical volume |
|---|---|---|
| Initial development | Build, test, and deploy a parser per partner | 2-4 weeks of engineering per integration |
| Schema drift rework | Patch the script when a partner renames a column or changes a type | 4-8 hours per quarter per integration |
| Error handling | Build retry logic, logging, and failure routing | 1-2 weeks per integration, partially reusable |
| Monitoring and alerting | Per-integration dashboards, runbooks, on-call coverage | 1-2 weeks setup, continuous operations |
| On-call burden | Middle-of-the-night pages when a partner breaks a column | 1-3 incidents per month at 10+ integrations |
| Knowledge transfer | Time lost when the author of a script leaves the team | Weeks to months per departure |
| Opportunity cost | Product work the engineering team does not do | Hard to measure, consistently large |
One integration, one script, one quiet week of maintenance. Ten integrations, ten scripts written by different people, with ten sets of assumptions that nobody remembers, and a pager that goes off whenever any of them hits an edge case. The cost is not linear.
The multiplication problem
Custom integrations do not scale linearly. They scale combinatorially.
With one integration, you maintain one script. With ten integrations, you maintain ten scripts, ten sets of error handling logic, ten monitoring configurations, ten sets of documentation (if documentation exists at all), and ten different approaches to problems solved slightly differently by whoever happened to write the script.
This is the hidden cost engineering teams underestimate. It is not the build time. It is the maintenance surface area.
Consider what happens when your destination schema changes. You add a required field to your internal data model. Now the integrations need updating. With a platform, you update the destination schema once and re-map. With custom scripts, you open ten codebases and make ten changes, and then test and deploy ten times.
Or consider onboarding a new engineer. With a platform, they learn one tool. With custom scripts, they need to understand the conventions (or lack thereof) across the integrations, most of them built by engineers who may no longer be on the team.
The opportunity cost nobody calculates
Engineering time is finite. An hour spent maintaining data plumbing is an hour not spent building product features, improving performance, or reducing technical debt.
This trade-off is invisible in most organizations because integration maintenance is distributed. It is not a line item on anyone’s roadmap. It shows up as “that thing Sarah fixes periodically” or “the script Jake rewrites whenever the partner changes formats.” It never gets prioritized because it never gets measured, but it quietly consumes engineering capacity.
The harder question is: what would your team build if they were not maintaining integration scripts?
Build vs buy, side by side
| Dimension | Custom development | Integration platform |
|---|---|---|
| Time to first integration | 2-4 weeks | Hours to days |
| Time to tenth integration | ~10x the first | Comparable to the first |
| Schema drift handling | Manual code updates | Detected, updated mapping proposed for review |
| Error handling | Built per integration | Consistent across integrations |
| Monitoring | Custom per integration | Built-in |
| On-call surface | Scales with integration count | Single system, single alerting path |
| Documentation quality | Varies by author | Standardized configuration |
| Non-engineer maintenance | Requires engineering for mapping changes | Operations teams modify mappings directly |
| Partner onboarding | Engineering project | Configuration task |
| Cost profile | Salary cost, scales with team | Platform fee, scales with use |
| Knowledge concentration | High risk if the author leaves | Low risk, configuration is platform-readable |
| Long-term trajectory | Maintenance surface area grows | Platform improvements benefit all integrations |
When custom development makes sense
Custom development is the right choice in specific circumstances.
Truly unique processing logic. If the integration requires domain-specific computation that no platform could reasonably support (proprietary algorithms, real-time stream processing with sub-millisecond requirements, or deep integration with internal systems that have no external API), a custom build is justified.
A single, stable, high-volume pipeline. If you have exactly one integration, the source schema never changes, and the volume demands warrant purpose-built infrastructure, a custom solution can outperform a general platform.
Regulatory requirements mandating full code ownership. Some industries require that code processing sensitive data be written, reviewed, and maintained internally. If that is the situation, a platform may not satisfy compliance.
For most companies, the integration problem is not unique. It is the same problem repeated across partners: parse the data, figure out what maps where, transform it into the right shape, validate it, load it, and handle whatever goes wrong.
When a platform makes sense
A platform becomes the better choice when these conditions are true.
You have more than two or three integrations. The maintenance multiplication described above starts compounding quickly. Three integrations are manageable. Ten are a full-time job. Twenty are a team.
Partner count is growing. If your business model involves onboarding new data partners, whether they are customers, suppliers, distributors, or affiliates, new partnerships should not require engineering projects.
Your team is spending more time on maintenance than on new integrations. This is the inflection point. When the backlog of “fix the broken script” tickets outnumbers “build new integration” tickets, the custom approach has hit its ceiling.
Business users need to modify integration logic. When a field mapping change requires an engineering ticket, a code review, and a deployment, the integration process has become a bottleneck. Platforms let operations teams make changes without engineering involvement.
How datathere approaches this problem
datathere treats data integration as a mapping and quality problem, not a coding problem. When a new partner sends data in CSV, JSON, XML, or PDF, the platform reads the schema from the data and drafts the mapping and quality rules. An operations team member reviews and certifies. Once certified, the integration runs on deterministic code in production with a full audit trail.
Onboarding a new partner is measured in hours rather than weeks. Maintaining twenty integrations scales the same way as two. The platform absorbs the commodity work as infrastructure.
The engineering team builds product. The operations team manages integrations. The plumbing stops consuming the people who should be building the house.
FAQ
How long does a custom integration typically take to build?
The initial parser for a straightforward format (CSV with stable columns) takes a senior engineer 1-2 weeks. More complex formats (PDF extraction, XML with variable schemas, file formats with regional variations) extend to 3-6 weeks. Production hardening with monitoring and error handling adds another week or two on top.
What does a data integration platform cost?
Platform costs scale with volume or configuration complexity rather than with engineering team size. Mid-market integration platforms typically range from $20K to $200K per year, with enterprise deployments higher. The breakeven against in-house development usually occurs around 3-5 concurrent integrations for most teams.
Can we start with custom development and switch to a platform later?
Yes, but the transition costs are real. Mapping logic in custom scripts rarely translates directly into a platform. The usual pattern is to freeze new custom integrations once a platform is chosen, migrate the highest-maintenance scripts first, and let the lower-maintenance ones age out. Full migration of 10+ integrations typically takes 2-3 quarters.
What happens to existing custom integrations when we adopt a platform?
They keep running while new integrations move to the platform. Over time, the custom scripts that require the most maintenance get prioritized for migration. Scripts that are genuinely stable and rarely touched may stay as-is indefinitely.
Is a platform worse for very high-volume integrations?
Not inherently. Modern integration platforms handle large volumes well. The question is whether the processing logic is a commodity (mapping, transformation, validation) or something unique that benefits from purpose-built infrastructure. For the commodity case, a platform is usually equivalent or better. For genuinely unique processing at scale, a custom system can outperform.
How do we decide when to switch?
Two practical signals: the maintenance backlog on existing integrations exceeds the new-integration backlog, and a format change from a single partner requires more than a few hours of engineering work. Both indicate the custom approach has hit its scaling ceiling.