datathere

Enterprise Data Quality

AI-Powered Rule Profiling

Writing validation rules manually means making assumptions about what your data looks like. AI analyzes the values, patterns, and anomalies in your data and suggests rules based on what it finds.

Typos that appear once among thousands, casing inconsistencies, impossible ranges, placeholder text, and PII leakage are detected automatically. Each suggestion includes rationale with the failing values and their counts, so you can decide whether to accept, modify, or dismiss it.

  • Complete value distributions, not statistical samples
  • Detects typos, casing inconsistencies, placeholder text, and outliers
  • Email, phone, date, and URL format validation
  • PII leakage detection — SSN patterns, credit card numbers in wrong fields
AI-Suggested Rule
Field
customer_status
Rule
Must be one of: active, inactive, pending
Rationale

Found 3 valid values (active: 8,421, inactive: 2,103, pending: 847). Also found "actve" (2 occurrences) and "ACTIVE" (14 occurrences) — likely typos and casing errors.

Estimated violations: 16 error / quarantine
You describe

"must be a valid email address"

Rule created — tested against 11,371 rows — 99.2% pass
You describe

"price must be positive"

Rule created — tested against 11,371 rows — 99.8% pass
You describe

"country code must be exactly 2 characters"

Rule created — tested against 11,371 rows — 94.7% pass — 602 violations

Plain English Editing

You never have to think about validation logic. Describe what you want to check — "must be between 0 and 100" or "must not be empty" — and the system creates the rule, tests it against your data, and shows you the results.

When a rule needs to change, you describe the change the same way. The system updates the logic, re-tests, and shows you the new results. You accept or reject. That's it.

  • Validated against your data before you accept
  • See pass rate and sample violations immediately
  • Edit existing rules in natural language — same workflow

Three Enforcement Actions

Each rule specifies what happens when data fails validation. You choose the action per rule, and enforcement applies automatically during processing.

Quarantine

The row is removed from output and saved to a separate quarantine file with the failure reason attached, so you can inspect and reprocess it later.

Flag

The row continues processing but gets marked for your review. Output is unaffected — you decide what to do after the run.

Stop Job

The pipeline halts. Use this for critical rules where bad data should stop output from being produced.

Test Before Production

Dry-run all your active quality rules against your source data without processing anything. You see per-rule pass rates, violation counts, and sample failures — so you can tune rules before they affect production output.

The same rules run the same way in test and production. What you see in the preview is what happens during a production run.

  • Per-rule pass rates and violation counts
  • Sample violations with row context
  • Consistent behavior between test and production
Test Run Results
customer_status allowed values
99.8% pass
email format
97.2% pass
price positive
94.1% pass
country code length
98.5% pass
11,371 rows evaluated Overall: 97.4% pass
Violation Trends — 7 Day
email format
12 violations today
price positive
3 violations today
Top violating values
"actve" (2) "N/A" (7) "test" (4)

Violation Tracking and Trends

Violations are recorded with the row data, the value that failed, the rule it failed against, and the action taken. You see the context for each failure — enough to decide whether to fix the data upstream or adjust the rule.

7-day and 30-day trends show which rules are catching more violations over time — a signal that upstream data quality is degrading. The most common violating values per rule help you target cleanup.

  • 7-day and 30-day trends per rule with daily sparklines
  • Most common violating values for targeted cleanup
  • Row context for each violation
  • Run history with pass rates across all autopilot executions

Aggregation-Aware Enforcement

When your mapping includes aggregation, quality rules apply at the group level. If any row in a group fails a quarantine rule, the group is excluded — preventing partial aggregations from reaching your output.

  • Group-level enforcement for aggregated data
  • Propagated violations tracked separately in reports

Flexible Rule Management

Data quality is enabled at the organization level as part of your subscription. Toggle individual rules on or off without deleting them — useful when you're tuning rules or onboarding new data sources. Pause quality checks at the organization level when you need to.

  • Enable or disable individual rules without deleting
  • Organization-level quality toggle