Supplier catalog onboarding: normalizing inbound formats

Fifty suppliers, fifty formats

A regional grocery chain sources from local farms, national distributors, and specialty importers. A fashion retailer works with domestic manufacturers, overseas factories, and independent designers. A home goods brand buys from artisans, industrial suppliers, and white-label producers.

The common thread is not industry. It is the chaos at the front door.

One supplier sends an Excel file with columns labeled “Item #,” “Desc,” “Whsl Price,” and “Qty Avail.” Another sends a CSV with “SKU,” “Product Description,” “Cost,” and “On Hand.” A third sends a PDF catalog with product details embedded in formatted tables. A fourth has an API that returns JSON, but the field names are abbreviated codes that require a separate reference document to decode.

None of them match your internal catalog structure. All of them need to.

The manual process everyone recognizes

The typical supplier onboarding workflow looks like this:

A buyer negotiates terms with a new supplier. The supplier agrees to provide product data. Someone on the catalog or operations team sends the supplier a template spreadsheet. The supplier fills it out partially, incorrectly, or in a completely different format than requested. The operations team downloads the file, opens it, and starts the real work.

Column headers get renamed. Fields get rearranged. Missing data gets flagged and sent back via email. “What does ‘CS’ mean in the pack size column?” “Is this price per unit or per case?” “Your weight column has some entries in pounds and some in kilograms.” These emails go back and forth for days, sometimes weeks.

Once the data is cleaned up enough to import, someone manually copies it into the catalog system or pastes it into an import template. If the system rejects rows, the errors get triaged one by one. The whole process repeats for the next supplier.

For a retailer onboarding five or ten suppliers a year, this is tedious but survivable. For one onboarding fifty, or managing ongoing data updates from existing suppliers, it consumes a full-time headcount that never appears on the org chart.

Why the template strategy fails

The obvious solution is to standardize: give every supplier the same template and require them to use it. In theory, this eliminates format variation. In practice, it fails for three reasons.

First, suppliers do not follow templates. They fill in what they understand, leave blank what they do not, and add columns for data they think is important. A supplier who has been sending product data to retailers for twenty years has their own format. Asking them to restructure their data for your template is asking them to do unpaid work. They will do it badly or not at all.

Second, templates cannot accommodate the range of product types across suppliers. A template designed for apparel does not work for electronics. One designed for packaged food does not work for fresh produce. You either create dozens of category-specific templates (which multiplies the maintenance burden) or create a generic template (which captures nothing well).

Third, templates freeze the schema. When your internal catalog structure changes (new fields for sustainability data, updated compliance requirements, additional image specifications), every template needs updating, and every supplier needs re-educating.

The structural alternative

The problem is not that supplier data is messy. It is that the onboarding process treats each supplier’s format as an obstacle to overcome manually, rather than a mapping problem to solve once.

When a supplier sends a spreadsheet, that spreadsheet has a schema, implicit in its column headers, data types, and value patterns. “Whsl Price” is wholesale price. “Qty Avail” is quantity available. “Item #” is the supplier’s product identifier. These mappings are not ambiguous. They are just different from your field names.

A platform that can analyze an incoming file, identify the schema, and generate mappings to your catalog structure turns supplier onboarding from a manual reformatting project into a review-and-approve workflow.

datathere handles this by accepting whatever the supplier sends: CSV, Excel, JSON, XML, or PDF. The AI examines the file structure, identifies field relationships to your destination catalog schema, and generates a mapping with reasoning for each decision. A catalog manager reviews the proposed mapping, adjusts what the AI got wrong, and certifies. From that point forward, files from that supplier process through the certified mapping automatically.

For PDF catalogs and suppliers who only have print-formatted product sheets, AI vision extraction pulls structured data from formatted tables, product grids, and specification blocks. The extracted data feeds into the same mapping workflow as other formats.

Quality enforcement at the front door

Getting supplier data into a consistent format solves the structure problem. It does not solve the quality problem.

Supplier data has gaps. Products missing weight. Descriptions truncated at 20 characters. Prices that look like they might be in the wrong currency. UPC codes that do not pass check-digit validation.

The manual approach catches some of these on visual inspection. It misses more than it catches, especially under time pressure. The errors surface later: a product listed online with no image, a pricing error that costs margin, a missing allergen declaration that creates liability.

Quality enforcement rules applied during the mapping process catch these problems at ingestion. A product record missing a required field can be quarantined until the supplier provides the data, flagged for review, or set to stop the job entirely, depending on the field’s criticality. The rules are defined once per destination schema and applied consistently to every supplier’s data.

This shifts the burden from the catalog team finding and fixing quality issues to the system preventing them from entering the catalog in the first place.

What changes when onboarding scales

The difference between onboarding supplier number three and supplier number fifty should be negligible. The catalog structure does not change. The quality requirements do not change. The only thing that changes is the source format, and that is exactly what a mapping platform handles.

A new supplier sends their product file. The platform generates mappings. A catalog manager reviews and certifies, correcting any misidentified fields. The supplier’s data flows into the unified catalog. Total elapsed time: hours, not weeks.

When the supplier updates their data (new products, price changes, discontinued items), the same certified mappings apply. No manual reformatting. No template re-education. The supplier sends data the way they always have, and the platform translates it into the structure your catalog requires.

For retailers growing their supplier base, this is the difference between supplier onboarding as a bottleneck and supplier onboarding as a non-event. The buying team negotiates the relationship. The operations team approves the mappings. The catalog stays clean. Nobody spends their week renaming spreadsheet columns.

Manual process vs mapping platform

Dimension	Manual onboarding	Mapping platform
Time per new supplier	2-5 days of operations work	Hours
Format flexibility	Requires template compliance	Any CSV, Excel, JSON, XML, or PDF
Schema drift handling	Manual cleanup per file	Detected, updated mapping proposed for review
Quality enforcement	Inconsistent visual inspection	Rules applied at ingestion
Audit trail	Emails and spreadsheets	Certified mapping history
Scaling to 50+ suppliers	Full-time catalog team	No additional headcount
Recovery when a supplier changes format	Rebuild the mapping	Update and re-certify

FAQ

Do suppliers have to use our template?

No. A mapping platform reads the format the supplier already uses. Columns named “Item #” or “SKU” or “product_code” all map to the same internal field once the mapping is certified. Abandoning the template requirement removes one of the most friction-heavy parts of onboarding.

What about suppliers that only send PDF catalogs?

PDF extraction reads structured data out of tables, grids, and specification blocks. The extracted data goes through the same mapping workflow as a CSV. Print-only suppliers no longer require manual re-keying of product data.

How does the platform handle pricing and unit variations?

A single supplier might list some items in pounds, others in kilograms, with prices per unit or per case. Transformation rules applied during the mapping step normalize units and pricing to the retailer’s internal conventions. The mapping carries the normalization logic, so the supplier can keep sending data the way they always have.

What happens when a supplier changes their format?

Format changes are the common failure point for manual onboarding. A mapping platform detects the structural change, proposes an updated mapping, and routes it for review. The pipeline keeps running on the existing certified mapping until the updated one is approved. Nothing breaks silently.

How does quality enforcement work at scale?

Quality rules live with the destination schema, not per supplier. A required field is required for every supplier’s data. Value constraints apply to every supplier’s data. Non-compliant records are quarantined, flagged, or rejected based on the severity. The rules get written once and enforced consistently across the full supplier base.