datathere
← Blog | Data Integration

Plain English Data Transformations

Mert Uzunogullari|

The developer bottleneck

A data analyst knows exactly what needs to happen to the data. The customer’s full name should be split into first and last name. The date should shift from European format to ISO 8601. The address fields should concatenate into a single line with proper comma placement. The currency amount needs to convert from cents to dollars.

The analyst knows the what. But the transformation tool requires JavaScript, Python, or SQL. So the analyst writes a ticket, waits for a developer, explains the requirement, reviews the implementation, requests a change because the edge case with hyphenated last names was not handled, waits again, and finally gets the transformation deployed three days after the need was identified.

This bottleneck is not caused by the complexity of the transformation. It is caused by the gap between knowing what should happen and being able to express it in code.

Describing transformations in plain English

Plain English data transformation closes that gap. Instead of writing value.split(' ').slice(1).join(' '), you describe the intent: “Extract everything after the first space in the full_name field.”

The AI interprets the description, generates the expression, and shows you what the output looks like against your actual data. If the result is not right, you refine the description. “Extract everything after the first space, but if there is no space, use the entire value as-is.” The AI adjusts the expression accordingly.

This interaction pattern changes who can build transformations. The person closest to the data — the one who understands the business rules, the edge cases, and the destination requirements — can define the logic directly. No translation layer. No ticket queue.

How the generation process works

When you describe a transformation in plain English, several things happen in sequence:

Intent parsing. The AI identifies what operation you want (split, concatenate, format, convert, extract, calculate) and which fields are involved. “Combine first_name and last_name with a space between them” parses into a concatenation operation on two specific fields.

Expression generation. The AI produces the actual transformation expression. This is real code, not a black box. You can read it, understand it, and modify it directly if you prefer. The plain English interface generates code; it does not replace it.

Sample execution. The generated expression runs against sample rows from your actual data. You see the input values and the corresponding output values side by side. If first_name is “Maria” and last_name is “Santos,” you see the output “Maria Santos” immediately.

Edge case surfacing. The AI examines your sample data for values that might cause problems. Null values, empty strings, unexpected formats, special characters. If your data contains a record where first_name is null, the AI flags this and asks how you want to handle it. The expression adjusts before it ever runs in production.

Four layers of validation

Generating a transformation expression is the easy part. Trusting it is harder. A syntactically valid expression can still produce wrong results, reference fields that do not exist, or fail on data types it was not designed to handle.

datathere validates every transformation through four layers before it can be used in production:

Syntax validation. The expression is parsed to confirm it is syntactically valid. Missing parentheses, unmatched quotes, and invalid operators are caught immediately. This is table stakes, but it eliminates an entire class of runtime failures.

Type checking. The expression’s input types are compared to the actual field types in the schema. If the expression calls .toUpperCase() on a field that contains numeric values, the type checker flags the mismatch. This catches errors that would pass syntax validation but fail at runtime.

Field reference validation. Every field referenced in the expression is checked against the actual schema. If you rename a source field after writing a transformation, the validator flags every expression that references the old name. No more runtime errors from stale field references.

Sample execution. The expression runs against real sample data from the source. Not synthetic test data, actual values from the dataset. The output is displayed for review. If the expression produces unexpected results on any sample row, you see it before the pipeline runs.

These four layers operate as a pipeline. An expression must pass all four to be considered valid. This means that a certified mapping configuration (see the certification workflow) contains only expressions that have been validated at every level.

Editing existing transformations

Plain English editing is not limited to creating new transformations. Existing expressions, whether generated by AI or written manually, can be modified by describing the change.

An expression that concatenates first and last name already exists. You need to add a middle initial. Instead of reading the JavaScript expression, understanding its structure, and modifying it correctly, you describe the change: “Add middle_initial between first_name and last_name, with periods after the initial.”

The AI reads the existing expression, understands its structure, applies the described change, and generates an updated expression. The four validation layers run again on the modified version.

This works particularly well for expressions you did not write yourself. A transformation created by a colleague six months ago uses logic you do not immediately understand. Instead of reverse-engineering the code, you describe the adjustment you need. The AI handles the translation between your intent and the existing implementation.

Common transformation patterns

Certain transformations appear in nearly every integration project. Plain English handles all of them without requiring knowledge of the underlying expression language:

String formatting. “Convert the email address to lowercase.” “Remove leading and trailing spaces from the company_name field.” “Replace all hyphens in the phone number with spaces.”

Date and time operations. “Convert the date from MM/DD/YYYY format to YYYY-MM-DD.” “Extract the year from the created_at timestamp.” “Calculate the number of days between order_date and ship_date.”

Conditional logic. “If the country is ‘US,’ format the phone number as (XXX) XXX-XXXX. Otherwise, keep it as-is.” “If the amount is negative, set the transaction_type to ‘refund.’ Otherwise, set it to ‘charge.’”

Numeric operations. “Convert the amount from cents to dollars by dividing by 100.” “Round the price to two decimal places.” “Calculate the total by multiplying quantity by unit_price.”

Data extraction. “Extract the domain from the email address.” “Get the first three characters of the postal code.” “Parse the city name from the full address string.”

Null handling. “If the middle_name is empty or null, use an empty string instead of null.” “Default the status to ‘active’ when the source field is missing.”

Each of these is a single sentence in plain English. The corresponding code might be simple or complex, but the person defining the transformation does not need to know or care about that distinction.

When code is still the right choice

Plain English transformation is not a replacement for code in all situations. Complex business logic with multiple branches, mathematical formulas with specific precision requirements, or performance-critical transformations that need optimization may be better expressed directly in code.

The point is that the choice should be available. A data analyst who needs a simple date format change should not need to write JavaScript. A developer who needs a complex multi-step calculation should not be forced to describe it in natural language.

In practice, most transformations in a typical integration project are straightforward enough for plain English. The 80% of transformations that involve string formatting, date conversion, and simple conditional logic are exactly the kind of work that should not require a developer. The remaining 20% (complex aggregations, custom parsing logic, multi-step calculations) can be written in code directly, with the same four-layer validation applied regardless of how the expression was created.

The compound effect

The impact of plain English transformations is not just speed on individual expressions. It changes the team dynamics around integration work.

When only developers can write transformations, integration projects have a hard dependency on developer availability. Work stalls when the development team is occupied with other priorities. Requirements get lost in translation between the person who understands the data and the person who writes the code.

When the person who understands the data can define transformations directly, the integration project moves at the speed of understanding rather than the speed of development capacity. The feedback loop tightens from days to minutes. The person who notices an edge case can fix it immediately instead of filing a ticket and hoping it gets prioritized.

This does not reduce the value of developers in the integration process. It redirects their time from mechanical expression writing to architecture, optimization, and the genuinely complex transformations where their expertise matters.