Loading...
This recipe evaluates the readiness and health of datasets before they enter any governance workflow. It identifies tables that are too wide, sparsely populated, missing profiling, or containing inconsistent metadata. The recipe classifies each table into High, Medium, Low, or Profile-Required risk priority and recommends appropriate next steps such as profiling, cleanup, or governance onboarding. It provides a reliable, objective way to determine which datasets are worth curating and which require remediation, helping teams avoid wasted effort and focus on high-value assets.
Large, wide tables often contain many empty or unused columns, missing profiling, stale attributes, and technical leftovers. Governing such tables leads to wasted effort, slow profiling, and misleading business interpretations. This recipe surfaces sparsity, profiling gaps, and structural issues early—showing whether a table is even ready for governance.
Step 1 — Read metadata from oecolumn, oetable, and oeschema
Load column details, table width, profiling values, and schema metadata.
Step 2 — Compute sparsity per column
Use the formula: (nulls + zeros + empty values) / rowcount to measure how much real data exists.
Step 3 — Detect profiling gaps and inconsistencies
Identify columns with missing profiling, zero rowcount, or inconsistent metrics.
Step 4 — Aggregate to table-level metrics
Compute width, sparse column count, % sparse, inconsistent profiling, and no-profile attributes. :contentReference[oaicite:5]{index=5}
Step 5 — Assign risk categories
PROFILE_REQUIRED / HIGH / MEDIUM / LOW based on sparsity and profiling completeness.
Step 6 — Generate recommendations
Create governance actions such as reprofiling, cleanup, or deprioritization.
Step 7 — Output results
Produce table_sparsity_summary and sample views for UI display.
| Insight Category | What the recipe discovered | Business Impact |
|---|---|---|
| High-sparsity tables | Several tables contain more than 80% sparse columns. | Indicates poor data quality; governance should deprioritize these datasets. |
| Missing profiling | 10% of columns have NULL rowcount or missing profiling stats. | Profiling must be rerun before assessing readiness or risk. |
| Inconsistent profiling | Several numeric columns show contradictory profiling indicators. | Signals ingestion or profiling anomalies that need admin attention. |
Make sure the following ingredients are available: