Loading...
This recipe identifies column and tables names from the schemas across your data ecosystem that are not easily understandable. It identifies columns and tables with very short names, numeric-only names, absence of vowels, technical names and numeric/underscore-heavy patterns and also explains the reason for the identification
Cryptic names (short tokens, hashes, numeric-only fields) reduce discoverability, cause analysts to choose wrong objects, lengthen onboarding, and increase stewardship effort. This recipe flags unreadable metadata using heuristics, entropy scoring, and AI-assisted judgement to generate governance-ready remediation outputs.
Step 1 — Read: Ingest catalog metadata
Read tables from oetable, columns from oecolumn, join with oeschema and connectioninfo to gather full context.
Step 2 — Compute: Rule-based flagging
Apply rules (too short, no vowels, numeric-only, high-entropy tokens, repeated hashes) to identify candidate unreadable names.
Step 3 — Compute: Whitelist & false-positive reduction
Use curated whitelist and pattern exceptions to filter valid short business tokens and reduce noise.
Step 4 — Compute: AI-assisted readability judgement
Run AI checks to confirm whether flagged names convey business meaning or are junk; attach a Readability verdict and confidence score.
Step 5 — Compute: Classify reasons and severity
Assign explicit reason codes (auto-generated, numeric-hash, too-short, acronyms) and severity levels for remediation prioritisation.
Step 6 — Output: Produce governance-ready exports
Generate humanly_not_understandable_columns, humanly_not_understandable_tables, and a prioritized remediation dashboard for stewards.
| Insight Category | What the recipe discovered | Business Impact |
|---|---|---|
| Unreadable column share | 12% of catalog columns flagged as too short or vowel-less. | Prioritise cleanup to reduce analyst selection errors and improve search results. |
| Top violating schemas | Three schemas contain the majority of underscore-heavy and numeric names—likely ingestion pipelines to review. | Focus engineering fixes on those ingestion sources to reduce future junk creation. |
| AI correction suggestion rate | AI judged ~30% of short names as valid business acronyms; remaining flagged for rename. | Reduces false-positive workload for stewards and speeds up remediation cadence. |
Make sure the following ingredients are available in your workspace: