Others • Others

Legacy Data Quality Issue Identifier and Reporter

Description

The Legacy Data Quality Detection recipe automatically identifies structural data quality issues across the data catalog by analyzing metadata for inconsistencies in column definitions. It detects problems such as data type drift, structural attribute deviations, and schema variance where the same column name is modeled differently across tables. By generating structured data quality signals, the recipe enables governance teams to proactively identify schema-related issues and raise standardized DQ tickets with clear evidence. This helps organizations maintain consistent data models, improve catalog health, and prevent downstream analytics or integration failures.

Details

Legacy Data Quality Detection

Detect structural inconsistencies across enterprise data schemas

Why this recipe matters

Many enterprise data quality problems originate from schema-level inconsistencies rather than incorrect values. When the same column name appears with different data types, lengths, nullability rules, or structural contexts across tables, it creates hidden semantic drift. These inconsistencies can cause reporting conflicts, failed integrations, and governance confusion. This recipe systematically scans metadata to detect structural deviations early, enabling teams to proactively identify and track schema-related data quality issues before they impact analytics or operational workflows.

Business value

Identifies structural data quality risks before analytics or reporting failures occur
Provides standardized evidence for raising governance-backed DQ tickets
Improves enterprise schema consistency across systems and data domains
Supports proactive catalog health monitoring and metadata-driven governance

Who benefits

Data Quality teams responsible for monitoring catalog health
Data Governance leaders managing schema standards
Data Owners responsible for maintaining data consistency
Analytics and integration teams relying on stable schemas

What you receive

DQ Signal Tables Issue Classification DQ Ticket Inputs

How the recipe works (guided flow)

Step 1 — Identify candidate tables
Tables with sufficient row counts are selected to ensure meaningful data quality analysis.

Step 2 — Build unified structural base
Candidate tables are joined with their column metadata to create a consolidated structural dataset.

Step 3 — Detect data type drift
Columns with the same name across multiple tables are analyzed to identify inconsistent data type definitions.

Step 4 — Detect structural attribute deviation
Columns sharing the same name are evaluated for differences in data type, length, and nullability attributes.

Step 5 — Detect logical suffix schema variance
Similar column name patterns are compared across tables to reveal schema evolution inconsistencies.

Step 6 — Generate structured DQ signals
All detected issues are consolidated into structured DQ signal tables that support ticket creation and governance workflows.

Sample insights (illustrative)

Insight Category	What the recipe discovered	Business Impact
Common Data Type Drift	Customer_ID appears as INTEGER in one table but VARCHAR in another.	Inconsistent joins and integration errors across reporting systems.
Structural Attribute Deviation	Order_Status column uses different lengths and nullability across schemas.	Ambiguous business meaning and inconsistent validation rules.
Logical Schema Variance	Columns ending with "_date" modeled with different data types across tables.	Breaks standardized date analytics and lineage traceability.

Before you run this recipe

Make sure the following ingredients are available in your workspace:

Connected datasets are crawled and profiled
Columns have distinct counts and top values generated

Back to Recipes