Loading...
This recipe builds an intelligent glossary population engine that automatically identifies, normalizes, and classifies potential business terms from existing report metadata. It streamlines data stewardship by mining report column names, scoring their frequency and relevance, and recommending domains, categories, and classifications. The result is a governed, business-friendly glossary foundation — accelerating standardization and reducing manual curation across the enterprise. Note: This recipe would take 2 to 3 hrs to execute depending on the cataloged data.
Most organizations lack complete business glossaries because manual term creation is slow, inconsistent, and dependent on subject-matter experts. This recipe analyzes metadata signals, naming conventions, and column patterns to auto-generate draft glossary terms that stewards can refine. This accelerates coverage and improves business understanding across domains.
Step 1 — Read metadata across tables, columns, schemas, and usage logs
Gather naming patterns, business prefixes, and frequently accessed fields.
Step 2 — Identify candidate business concepts
Cluster columns by shared naming tokens, semantic similarity, and domain context.
Step 3 — Generate term names
Use linguistic models and naming heuristics to propose readable business terms.
Step 4 — Generate draft term definitions
Summarize metadata, domain context, and usage to create definition suggestions.
Step 5 — Recommend term attributes
Suggest synonyms, category, domain, stewardship group, and related terms.
Step 6 — Map associated objects
Link proposed terms to tables, columns, and reports based on usage patterns.
Step 7 — Score and prioritize terms
Rank by business importance, data criticality, and usage frequency.
Step 8 — Output governance-ready term proposals
Produce a term_suggestions dataset for steward review and glossary onboarding.
| Insight Category | What the recipe discovered | Business Impact |
|---|---|---|
| Term gaps | 57 frequently used columns map to unnamed business concepts. | Accelerates glossary expansion and reduces ambiguity in reporting. |
| High-value terms | AI identified “Customer Lifecycle Value” as a top-value term candidate. | Improves analytics alignment and strategic reporting. |
| Synonym clusters | Multiple variations of “revenue” found across domains (rev, revenue_amt, net_rev). | Supports standardization and consistent metric definitions. |
Make sure the following ingredients are available: