Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sqlbuild.com/llms.txt

Use this file to discover all available pages before exploring further.

Why column lineage matters

Impact analysis - Before changing a source column, see exactly which downstream models and columns are affected. A rename or type change in raw_orders.id can be traced through every model that consumes it, even indirectly. Debugging data issues - When a column has unexpected values, trace it upstream to find where the data originates and what transformations it passes through. Instead of reading SQL files and mentally joining dependencies, ask SQLBuild to show the path. Documentation - Column lineage provides machine-readable metadata about your pipeline. The JSON output can feed data catalogs, governance tools, or custom dashboards.

How it works

SQLBuild analyzes column lineage statically at compile time using SQLGlot. No warehouse connection is needed. The analyzer parses each model’s SQL, resolves ref() and source() calls, and traces columns through SELECT lists, CTEs, JOINs, subqueries, and expressions. Column lineage requires SQLGlot to be enabled in project settings (it is by default).

Transform types

Each lineage edge is classified by how the column is transformed:
TransformDescriptionExample
directColumn passes through unchangedSELECT order_id FROM ...
castColumn is explicitly cast to a different typeSELECT CAST(id AS BIGINT)
expressionColumn is used in a computed expressionSELECT amount * 100 AS amount_cents
aggregationColumn is used inside an aggregate functionSELECT SUM(amount) AS total
starColumn is included via SELECT *SELECT * FROM ...
constantOutput column is a literal value with no upstream dependencySELECT 'active' AS status
Transform classification helps you understand the nature of each dependency. A direct edge means the column is a simple passthrough - safe to rename if you rename the source. An expression or aggregation edge means the column is computed - the upstream value is an input to a calculation, not a 1:1 mapping.

Confidence levels

Each edge also carries a confidence level indicating how certain the analyzer is about the traced dependency:
ConfidenceMeaning
highThe lineage path is fully resolved through known SQL constructs
mediumThe path is likely correct but involves constructs the analyzer handles with heuristics
lowThe path is best-effort - complex SQL patterns or unsupported constructs may reduce accuracy

Analysis modes

Column lineage supports two analysis modes that trade off speed against depth of analysis. Rich mode uses SQLGlot’s lineage module and optimizer to resolve columns through CTEs, subqueries, and multi-level nesting with full transform classification. Thorough, but slower because the optimizer runs per column per model. Fast mode parses the SQL AST directly to extract column mappings, resolve CTE references, and classify transforms. It handles the same SQL patterns that most column lineage tools support and is fast enough to run on every compile. sqb compile defaults to fast mode because it runs frequently and analyzes the entire project. sqb lineage defaults to rich mode because it targets a specific column in a scoped slice of the DAG, where the deeper analysis is worth the cost. Both are overridable:
sqb compile --lineage-mode rich
sqb lineage fact_orders.total_cents --mode fast

Using column lineage

Interactive tracing with sqb lineage

Trace a specific column upstream or downstream:
# Where does this column come from?
sqb lineage fact_orders.total_cents

# What consumes this column?
sqb lineage fact_orders.order_id --direction downstream

# Limit to 1 hop
sqb lineage fact_orders.total_cents --depth 1

# JSON output
sqb lineage fact_orders.total_cents --format json
The target syntax is model_name.column_name. Column lineage supports upstream and downstream directions (not both - model lineage supports both). See the lineage CLI reference for full flag documentation and output format examples.

Batch analysis with sqb compile

Every compile run includes column lineage in the output:
# Default: fast column lineage
sqb compile

# Rich column lineage in compile
sqb compile --lineage-mode rich

# Skip column lineage
sqb compile --lineage-mode none

# JSON report includes per-model lineage summary
sqb compile --json
In the JSON compile report, each model includes a lineage field with column_count, edge_count, and has_star metadata. See the compile CLI reference for details on the compile report format.

Integration with contract validation

Column lineage feeds into compile-time contract validation. When a model declares columns in its MODEL() header, the compiler uses inferred column information to check that:
  • Every declared column exists in the query output
  • Column types match the declared types (when type_enforcement is enabled)
These checks run automatically during sqb compile and report diagnostics with source-annotated error messages.

Limitations

  • Column lineage requires SQLGlot to be enabled (sqlglot = true in settings, which is the default)
  • Complex SQL patterns (deeply nested correlated subqueries, dynamic SQL, adapter-specific functions) may reduce accuracy or confidence
  • SELECT * is tracked as a star transform - the analyzer knows the column passes through but the mapping is less precise than explicit column references
  • Column lineage is computed statically from SQL text. Runtime-only column additions (e.g. from dynamic UDFs) are not tracked