Loading...
The Legacy Data Quality Detection recipe automatically identifies structural data quality issues across the data catalog by analyzing metadata for inconsistencies in column definitions. It detects problems such as data type drift, structural attribute deviations, and schema variance where the same column name is modeled differently across tables. By generating structured data quality signals, the recipe enables governance teams to proactively identify schema-related issues and raise standardized DQ tickets with clear evidence. This helps organizations maintain consistent data models, improve catalog health, and prevent downstream analytics or integration failures.
Many enterprise data quality problems originate from schema-level inconsistencies rather than incorrect values. When the same column name appears with different data types, lengths, nullability rules, or structural contexts across tables, it creates hidden semantic drift. These inconsistencies can cause reporting conflicts, failed integrations, and governance confusion. This recipe systematically scans metadata to detect structural deviations early, enabling teams to proactively identify and track schema-related data quality issues before they impact analytics or operational workflows.
Step 1 — Identify candidate tables
Tables with sufficient row counts are selected to ensure meaningful data quality analysis.
Step 2 — Build unified structural base
Candidate tables are joined with their column metadata to create a consolidated structural dataset.
Step 3 — Detect data type drift
Columns with the same name across multiple tables are analyzed to identify inconsistent data type definitions.
Step 4 — Detect structural attribute deviation
Columns sharing the same name are evaluated for differences in data type, length, and nullability attributes.
Step 5 — Detect logical suffix schema variance
Similar column name patterns are compared across tables to reveal schema evolution inconsistencies.
Step 6 — Generate structured DQ signals
All detected issues are consolidated into structured DQ signal tables that support ticket creation and governance workflows.
| Insight Category | What the recipe discovered | Business Impact |
|---|---|---|
| Common Data Type Drift | Customer_ID appears as INTEGER in one table but VARCHAR in another. | Inconsistent joins and integration errors across reporting systems. |
| Structural Attribute Deviation | Order_Status column uses different lengths and nullability across schemas. | Ambiguous business meaning and inconsistent validation rules. |
| Logical Schema Variance | Columns ending with "_date" modeled with different data types across tables. | Breaks standardized date analytics and lineage traceability. |
Make sure the following ingredients are available in your workspace: