Link Search Menu Expand Document

Data Catalog and Lineage

data engineering

query

Data Catalog

Definition

A data catalog is an organized inventory of data assets across your organization.

It provides context, meaning, and trust so people can easily find the right data and use it with confidence.

Single source of truth about the data

Advantage

  • Sentralize
  • Increase trust
  • Save time
  • Improve collaboration
  • Drive value

Components

  • Business context
    • Descriptions
    • terms
    • classifications
  • Ownership and stewardship
    • ensure accountability and quality over time
  • Trust and Quality
    • Quality scores
    • certification
    • policies
  • Lineage
    • where data comes from
    • how it changes
    • where it used
  • Usage and Popularity
    • how data is used
    • by whom to make better decisions
  • Related Assets
    • Link to dashboards, reports, notebooks, APIs, and documents

workflow

  • Connection to the data
  • Discovery and collect metadata and lineage
  • Enrich with business context, classifications and quality rules
  • Govern with define ouwnership, policies, and trust levels
  • Share and use to make data easy to find, and understand.

Tips

  • Start Small, Think Big: begin with high-value domains and expand gradually.
  • Define ownership early
  • Standardize business terms
  • Automate whenever possible
  • measure and improve: track usage, quality and bussiness impact continuously.

Data Lineage

  • It shows where a number started, what happend to it, and where it ended up.
  • It healps to answer without guessing.
    • debug issues faster
    • change things more safely
    • build trust in the numbers
  • define zoom level: from what people see, where it’s stored, sources data.