The spreadsheet that takes three weeks to build
Every supply chain network design project starts with the same request: “We need a dataset that combines our order history, inventory positions, and transportation costs so we can model scenarios.”
The order data lives in the ERP. The inventory data lives in the WMS. The transit times and freight rates live in the TMS. Three systems, three vendors, three schemas, three definitions of what a “location” or “product” or “order” means. The analyst who needs to model whether opening a new DC in Dallas reduces total landed cost has to extract data from all three, reconcile the differences, and build a unified dataset before any actual analysis begins.
This data preparation phase routinely takes two to four weeks. Not because the analysis is hard, but because the data does not fit together. The ERP uses SAP material numbers. The WMS uses internal SKU codes. The TMS uses carrier-assigned product classifications. The same physical warehouse is DC-04 in the ERP, WAREHOUSE_DALLAS_01 in the WMS, and facility code DAL1 in the TMS. Every join between these systems requires a crosswalk table that someone built in Excel and hopes is still accurate.
The modeling itself (running optimization scenarios, evaluating trade-offs, presenting recommendations) might take a week. The data wrangling takes three. The ratio is backwards, and it recurs every time the analysis needs refreshing.
What each system contributes (and withholds)
Understanding why the integration is painful requires understanding what each system actually provides.
ERP (SAP, Oracle, Microsoft Dynamics, NetSuite). The ERP holds order history: what was ordered, when, by whom, shipping destination, and cost data (COGS, pricing, margin). It also holds supplier information, procurement costs, and sometimes demand forecasts. The schemas are deep and vendor-specific. SAP’s order tables (VBAK/VBAP) bear no resemblance to Oracle’s order management schema. Field naming conventions are cryptic: MATNR, WERKS, LGORT in SAP are material number, plant, and storage location respectively, but nothing in the field names suggests this.
WMS (Manhattan Associates, Blue Yonder, Korber). The WMS holds inventory snapshots, warehouse capacity utilization, pick/pack/ship throughput, and storage costs. It knows how much product is in each location, how fast it moves, and what it costs to handle. The WMS also holds fulfillment data that overlaps with the ERP’s order data but uses different identifiers and different granularity. Where the ERP sees an “order,” the WMS sees individual pick tasks, pack stations, and shipping lanes.
TMS (Oracle Transportation Management, MercuryGate, project44). The TMS holds carrier rate tables, transit time matrices, shipment history, and freight spend. It knows how long it takes to move goods between any two points in the network and what it costs by carrier, mode, and service level. Rate structures are complex: multi-tier pricing with fuel surcharges, accessorial charges, minimum charges, and dimensional weight rules that vary by lane and carrier.
Each system is authoritative for its domain. None of them is designed to talk to the others.
The identifier problem
The most fundamental obstacle to combining ERP, WMS, and TMS data is that they do not agree on what things are called.
Products have different identifiers in each system. The ERP’s material number, the WMS’s SKU, and the TMS’s commodity classification might all refer to the same physical product. Sometimes there is a direct mapping. More often, the relationship is many-to-many. The ERP groups products differently than the WMS, which groups them differently than the TMS.
Locations are identified differently everywhere. The ERP thinks in terms of plants and distribution centers using internal codes. The WMS uses facility identifiers tied to its own configuration. The TMS uses location codes that often correspond to physical addresses but are formatted as carrier-specific facility codes. Joining a shipment’s origin in the TMS to the fulfilling warehouse in the WMS requires a location crosswalk that accounts for the fact that one WMS facility might appear as three different TMS locations (one per dock or shipping area).
Time periods do not align cleanly. The ERP records order date and requested delivery date. The WMS records pick date, pack date, and ship date. The TMS records tender date, pickup date, and delivery date. A network design model needs to know the full order-to-delivery timeline, which means stitching together timestamps from all three systems for the same physical shipment.
Units of measure vary. The ERP tracks order quantities in sales units (eaches, cases, pallets). The WMS tracks inventory in storage units (which might differ from sales units). The TMS tracks shipments in freight units (weight, cube, pallet positions). Modeling total landed cost requires converting between all of them.
Building the crosswalks to resolve these mismatches is the work that takes three weeks. It is manual, error-prone, and needs to be redone every time the underlying systems change, which, in a large enterprise, is constantly.
Multi-source joins: the core of the problem
Network design datasets are inherently multi-source. The value is not in any individual system’s data; it is in the combination.
A useful network design dataset might look like this: for every order in the past 12 months, show the order details (from ERP), the fulfillment location and handling cost (from WMS), and the transportation cost and transit time (from TMS). Add inventory positions by location (from WMS) and demand forecast by destination (from ERP) and you have the inputs for a facility location optimization model.
Building this dataset requires joining across all three systems on fields that do not match natively. The join logic is not a simple key lookup; it involves fuzzy matching on locations, temporal alignment across different timestamp fields, and unit-of-measure conversions that depend on product attributes stored in yet another system.
datathere’s multi-source join capability addresses this directly. Each system (ERP, WMS, TMS) is configured as a separate data source with its own schema. The AI analyzes the fields across all three sources and the destination schema (the network design model’s input format), identifies join candidates, and proposes join conditions with confidence scores.
When the AI recognizes that the ERP’s ship_to_location and the TMS’s destination_facility_code refer to the same concept despite different naming, it proposes the join and explains its reasoning. When the relationship is ambiguous (the WMS’s warehouse_id might join to the ERP’s plant_code or its storage_location, depending on the organizational structure), the confidence score drops and the mapping surfaces for human review.
Lane pricing consolidation
A specific pain point in TMS data extraction deserves attention: carrier rate consolidation for lane-level cost modeling.
Network design models need to know the cost of moving goods between any two points in the network. This cost data lives in the TMS as rate tables, but the structure of those tables varies dramatically by carrier and contract.
One carrier prices by weight break with fuel surcharges as a percentage. Another prices by pallet position with flat accessorial fees. A third uses dimensional weight pricing with minimum charges per lane. Consolidating these into a consistent cost-per-unit-per-lane structure requires normalizing rate structures that were designed for billing, not for modeling.
The transformation layer in datathere handles the conversion logic, taking carrier-specific rate structures and producing standardized lane costs. The AI generates the transformation expressions based on the rate table structure and sample data, but the business rules (which surcharges to include, how to handle minimum charges, whether to use contract rates or spot rates) require human input during the mapping review. The confidence scores guide reviewers to the rate structures where the transformation is uncertain, so they can focus their domain expertise where it matters most.
What clean network design data enables
Once the unified dataset exists, the modeling work that was always the point becomes possible.
Facility location analysis. With order demand by destination, fulfillment cost by warehouse, and transportation cost by lane, an optimization model can evaluate where to add, close, or resize facilities. The analysis is only as good as the input data. When that data is assembled manually from three systems with unresolved inconsistencies, the model’s recommendations carry an asterisk. When the data pipeline is repeatable and validated, the recommendations carry conviction.
Inventory positioning. Deciding how much stock to hold at each location depends on demand patterns (ERP), current inventory levels (WMS), and replenishment lead times (TMS transit data). A unified dataset lets planners model safety stock scenarios across the network rather than location by location.
Mode and carrier optimization. Comparing the total cost of LTL versus truckload versus intermodal on a given lane requires freight rate data normalized to common units. When lane costs are consolidated from multiple carriers in a consistent format, sourcing teams can identify lanes where they are overpaying and model the impact of shifting volume between carriers.
Scenario modeling for disruption. When a port closure, labor disruption, or supplier issue forces network changes, the speed of the response depends on how quickly the team can rerun the analysis with updated constraints. A pipeline that takes three weeks to rebuild is useless for disruption response. A pipeline that refreshes on a schedule or trigger — pulling current data from all three systems, applying validated mappings, and producing a clean dataset — turns scenario modeling from a quarterly project into a responsive capability.
The recurring cost of not solving this
The three-week data preparation cycle is not a one-time cost. Network design is not a one-time exercise. Markets shift, customer bases change, carrier rates update, new facilities come online, and every change warrants reanalysis.
Organizations that treat network design data as a manual assembly project find that the analysis becomes stale before it is complete. The team finishes the model using Q1 data just as Q3 starts. By the time recommendations reach decision-makers, the underlying assumptions have shifted.
The alternative is treating the ERP-WMS-TMS integration as a persistent pipeline rather than a project. Define the mappings once, certify them, and run the pipeline on a schedule. When the ERP schema changes (a new field, a renamed table), update the affected mappings and recertify. When a new carrier is added to the TMS, add its rate structure as a new source and map it to the existing canonical format.
datathere’s certification workflow and production pipeline support this model. Mappings go through review and approval before running in production. The 8-phase pipeline handles extraction, transformation, quality enforcement, and delivery. Triggers (scheduled, manual, or webhook-based) control when the pipeline runs. The result is a network design dataset that stays current without the quarterly fire drill of manual data assembly.
The hard part is not the modeling. It never was. The hard part is getting three systems that were never designed to work together to produce one clean, joined, consistent dataset. Solve that, and the analysis that justifies multi-million-dollar network decisions actually gets done on time.