Loading...
This recipe identifies columns across tables that have similar or matching names but are defined with different data types. Such mismatches can create challenges in data integration, reporting, and analytics, as columns that appear to represent the same business concept may not be directly comparable or compatible. By highlighting these discrepancies, the recipe helps data teams enforce consistency and ensures accurate data usage across the organization.
Columns with highly similar names but mismatched data types introduce reporting inconsistencies, pipeline failures, incorrect joins, and semantic ambiguity. This recipe detects such inconsistencies across the catalog and surfaces alignment opportunities for standardization.
Step 1 — Read column metadata
Load table, column, and datatype information from oecolumn.
Step 2 — Compute name similarity
Normalize names and apply fuzzy similarity checks to group columns that appear to represent the same concept.
Step 3 — Detect data-type mismatches
Identify groups where the same name pattern exists with different datatypes across tables.
Step 4 — Compute severity classification
Examples: INT vs STRING (High), TIMESTAMP vs DATE (Medium), DECIMAL(10.4) vs DECIMAL(10.0) (Low).
Step 5 — Attach potential harmonized datatype suggestions
Recommend conversions (e.g., convert STRING → INT for numeric fields).
Step 6 — Output grouped mismatch dataset
Generate similar_column_groups and mismatched_data_types reports for governance review.
| Insight Category | What the recipe discovered | Business Impact |
|---|---|---|
| Critical datatype conflicts | customer_id appears as INT in CRM and STRING in Sales. | Creates broken joins and inconsistent customer reporting across teams. |
| Timestamp inconsistencies | order_date appears as TIMESTAMP in one system and DATE in another. | Leads to inaccurate time-based calculations and reporting anomalies. |
| Numeric precision differences | amount is different in Billing and in Finance. | Causes rounding differences in financial dashboards. |
Make sure the following ingredients are available in your workspace: