datathere
← Blog | Supply Chain & Logistics

Warehouse Exception Streams: Sensor Feeds and Error Logs in One View

Mert Uzunogullari|

Three systems, three stories, one incident

A conveyor belt jams at 2:47 PM. The conveyor’s PLC logs a motor overload fault with a numeric error code. Thirty seconds later, the barcode scanner downstream reports a read failure because packages are not moving through the scan tunnel. A minute after that, the sorting system logs a timeout error because it expected packages that never arrived.

Three systems recorded the same incident. None of them know about the others. The conveyor log sits in a proprietary monitoring tool. The scanner errors land in a flat CSV export. The sorter writes JSON events to an API endpoint. An operations engineer investigating the fulfillment delay has to pull data from three different interfaces, mentally align the timestamps, and reconstruct the causal chain.

This is a Tuesday. It happens several times a week in any warehouse running automated material handling equipment. The individual system logs are fine. The problem is that no single view connects them.

The data landscape inside a warehouse

A modern fulfillment center generates an enormous volume of machine data, and almost none of it is standardized.

Conveyor systems produce telemetry from motor controllers, photoeye sensors, and PLCs. The data format depends on the equipment manufacturer. Dematic, Honeywell Intelligrated, Vanderlande, and others each use proprietary schemas. Error codes are numeric and manufacturer-specific. A code 4012 means something entirely different on a Dematic system than on an Intelligrated system. Timestamps may be in the PLC’s local clock, which drifts unless synchronized to NTP.

Barcode scanners generate read/no-read events with scan rate metrics. Some scanners output structured data with fields for barcode value, confidence score, scan timestamp, and lane ID. Others output a semicolon-delimited log line. High-end vision systems add dimensional data (length, width, height) and image references. Error formats range from simple binary (read/no-read) to multi-field diagnostic records with laser intensity and focal distance measurements.

Sorting systems (tilt-tray sorters, crossbelt sorters, sliding shoe systems) produce divert confirmations, no-read diverts, recirculation events, and jam alerts. The data schema depends on the sorter’s control software. Some systems report events per individual package. Others report aggregate statistics per time interval. Severity models vary: one system uses numeric levels 1-5, another uses text labels (INFO, WARNING, CRITICAL), and a third uses color codes in its native UI that do not translate to any exportable field.

Weight-in-motion scales, RFID readers, print-and-apply labelers each add another data format, another timestamp convention, another error taxonomy. A large warehouse might have 15-20 distinct system types generating operational data.

Why correlation matters more than collection

Most warehouse operations already collect this data. The individual monitoring tools work. The gap is not data availability; it is data correlation.

Root-cause analysis requires temporal correlation: what happened, on which system, in what order. When a fulfillment SLA breach occurs, the question is rarely “did something go wrong?” (the answer is obviously yes). The question is “what went wrong first, and what were the downstream effects?”

Answering that question with siloed data is manual and slow. An engineer opens three or four tools, exports data for the relevant time window, pastes it into a spreadsheet, sorts by timestamp, and traces the event chain. This process takes 20-45 minutes per incident. In a high-volume facility processing 100,000+ packages per day, there might be dozens of incidents worth investigating per shift.

The alternative — a unified exception stream where all system events appear in chronological order with normalized severity levels and correlated equipment identifiers — turns a 30-minute investigation into a 3-minute one. The causal chain is visible without manual reconstruction.

Mapping the chaos into a single schema

Building a unified exception stream is a multi-source integration problem with some specific challenges.

Schema diversity is extreme. The difference between a PLC error log and a barcode scanner event is not just field naming ; it is structural. A PLC log might be a flat record with 8 fields. A vision system diagnostic might be a nested JSON object with 40 fields, most of which are irrelevant to exception tracking. Mapping each source to a canonical exception schema requires understanding which fields carry operational meaning and which are diagnostic noise.

Timestamp alignment is critical and difficult. Equipment clocks drift. A PLC that has been running for six months without an NTP sync might be 15 seconds ahead of the scanner system. Fifteen seconds matters when you are trying to determine whether the conveyor jam caused the scan failure or vice versa. Any integration pipeline needs to account for clock skew, either by normalizing to a reference time source or by flagging events where the temporal ordering is ambiguous.

Severity normalization requires domain knowledge. Mapping one system’s CRITICAL to another system’s Level 5 to a third system’s Error Code > 4000 requires understanding what each system considers severe. This is not a syntactic mapping ; it is a semantic one. The AI needs sample data and context to determine that a scanner’s “no-read rate above 5%” and a sorter’s “recirculation count above threshold” represent comparable severity levels.

How datathere handles warehouse data integration

datathere treats each equipment system as a separate data source with its own schema. The AI examines the field structure and sample data from each system, then generates mappings to a canonical exception schema: a unified structure that captures timestamp, equipment ID, location, severity, error category, and raw detail.

The confidence scores are particularly useful here because the semantic distance between sources is large. Mapping a scanner’s scan_result field to a canonical error_category field is not obvious from field names alone. The AI examines sample values (NO_READ, PARTIAL_READ, MULTI_READ) and recognizes these as error categories, scoring the mapping accordingly. A low-confidence mapping on a PLC’s cryptic fault_reg_3 field tells the reviewer that this field needs manual interpretation.

IoT telemetry normalization is core to this workflow. Sensor data arrives in wildly different formats: some systems push data via MQTT, others write to local files, others expose REST endpoints. datathere’s multi-format ingestion (CSV, JSON, XML) handles the variety at the file level, and the AI mapping layer handles the semantic variety at the field level.

Quality enforcement catches the data anomalies that corrupt analysis. A conveyor event with a timestamp from 1970 (a common default when a PLC clock resets) gets quarantined rather than inserted into the exception stream where it would distort temporal analysis. A scanner event with a negative scan rate gets flagged. These rules run automatically in the production pipeline, catching equipment data issues that would otherwise surface as unexplainable anomalies in operational dashboards.

From exception tracking to predictive maintenance

A unified exception stream has value beyond incident investigation.

Pattern detection across systems. When conveyor motor overload events correlate with specific SKUs (heavy items on a belt segment rated for lighter loads), the pattern is only visible in a combined dataset that links conveyor telemetry with package data from the WMS. Siloed conveyor logs show the overloads. Siloed WMS data shows the SKUs. Neither shows the connection.

Maintenance scheduling. Equipment degradation shows up as increasing exception frequency before a hard failure occurs. A scanner that averaged 2 no-read events per hour for six months and now averages 8 is signaling a lens contamination or alignment issue. Detecting this trend requires consistent, normalized data over time, not raw logs in three different formats.

Throughput optimization. Sorting system recirculation rates, scanner first-pass read rates, and conveyor uptime percentages are all derived from exception data. When these metrics live in one normalized dataset, facility engineers can identify bottlenecks without assembling ad-hoc reports from multiple systems.

The foundation for all of this is the integration layer — the part that takes 15 different data formats from 15 different equipment types and produces one clean, correlated, time-aligned event stream. The equipment vendors are not going to standardize their data formats. The warehouse management system does not ingest raw sensor data. The gap between what the equipment produces and what the operation needs is a data mapping problem, and it scales with every piece of equipment on the floor.

Building that mapping manually is possible for a small facility with two or three system types. For a large operation with a dozen or more, it is the kind of tedious, high-stakes integration work where AI-assisted mapping pays for itself in the first month.