Column Lineage - SQLBuild

Why column lineage matters

Impact analysis - Before changing a source column, see exactly which downstream models and columns are affected. A rename or type change in raw__orders.id can be traced through every model that consumes it, even indirectly. Debugging data issues - When a column has unexpected values, trace it upstream to find where the data originates and what transformations it passes through. Instead of reading SQL files and mentally joining dependencies, ask SQLBuild to show the path. Documentation - Column lineage provides machine-readable metadata about your pipeline. The JSON output can feed data catalogs, governance tools, or custom dashboards.

How it works

SQLBuild analyzes column lineage statically at compile time. No warehouse connection is needed. The analyzer parses each model’s SQL, resolves ref() and source() calls, and traces columns through SELECT lists, CTEs, JOINs, subqueries, and expressions. Column lineage requires SQL analysis to be enabled in project settings (it is by default).

Transform types

Each lineage edge is classified by how the column is transformed:

Transform	Description	Example
`direct`	Column passes through unchanged	`SELECT order_id FROM ...`
`cast`	Column is explicitly cast to a different type	`SELECT CAST(id AS BIGINT)`
`expression`	Column is used in a computed expression	`SELECT amount * 100 AS amount_cents`
`aggregation`	Column is used inside an aggregate function	`SELECT SUM(amount) AS total`
`star`	Column is included via `SELECT *`	`SELECT * FROM ...`
`constant`	Output column is a literal value with no upstream dependency	`SELECT 'active' AS status`

Transform classification helps you understand the nature of each dependency. A direct edge means the column is a simple passthrough - safe to rename if you rename the source. An expression or aggregation edge means the column is computed - the upstream value is an input to a calculation, not a 1:1 mapping.

Confidence levels

Each edge also carries a confidence level indicating how certain the analyzer is about the traced dependency:

Confidence	Meaning
`high`	The lineage path is fully resolved through known SQL constructs
`medium`	The path is likely correct but involves constructs the analyzer handles with heuristics
`low`	The path is best-effort - complex SQL patterns or unsupported constructs may reduce accuracy

Analysis modes

Column lineage supports two analysis modes that trade off speed against depth of analysis. Rich mode uses the SQL analysis optimizer to resolve columns through CTEs, subqueries, and multi-level nesting with full transform classification. Thorough, but slower because the optimizer runs per column per model. Fast mode parses the SQL AST directly to extract column mappings, resolve CTE references, and classify transforms. It handles the same SQL patterns that most column lineage tools support and is fast enough to run on every compile. sqb compile defaults to fast mode because it runs frequently and analyzes the entire project. sqb lineage defaults to rich mode because it targets a specific column in a scoped slice of the DAG, where the deeper analysis is worth the cost. Both are overridable:

sqb compile --lineage-mode rich
sqb lineage fact_orders.total_cents --mode fast

Using column lineage

Interactive tracing with `sqb lineage`

Trace a specific column upstream or downstream:

# Where does this column come from?
sqb lineage fact_orders.total_cents

# What consumes this column?
sqb lineage fact_orders.order_id --direction downstream

# Limit to 1 hop
sqb lineage fact_orders.total_cents --depth 1

# JSON output
sqb lineage fact_orders.total_cents --format json

The target syntax is model_name.column_name. Column lineage supports upstream and downstream directions (not both - model lineage supports both). See the lineage CLI reference for full flag documentation and output format examples.

Batch analysis with `sqb compile`

Every compile run includes column lineage in the output:

# Default: fast column lineage
sqb compile

# Rich column lineage in compile
sqb compile --lineage-mode rich

# Skip column lineage
sqb compile --lineage-mode none

# JSON report includes per-model lineage summary
sqb compile --json

In the JSON compile report, each model includes a lineage field with column_count, edge_count, and has_star metadata. See the compile CLI reference for details on the compile report format.

Integration with contract validation

Column lineage feeds into compile-time contract validation. When a model declares columns in its MODEL() header, the compiler uses inferred column information to check that:

Every declared column exists in the query output
Column types match the declared types (when type_enforcement is enabled)

These checks run automatically during sqb compile and report diagnostics with source-annotated error messages.

Limitations

Column lineage requires SQL analysis to be enabled (sql_analysis = true in settings, which is the default)
Complex SQL patterns (deeply nested correlated subqueries, dynamic SQL, adapter-specific functions) may reduce accuracy or confidence
SELECT * is tracked as a star transform - the analyzer knows the column passes through but the mapping is less precise than explicit column references
Column lineage is computed statically from SQL text. Runtime-only column additions (e.g. from dynamic UDFs) are not tracked

​Why column lineage matters

​How it works

​Transform types

​Confidence levels

​Analysis modes

​Using column lineage

​Interactive tracing with sqb lineage

​Batch analysis with sqb compile

​Integration with contract validation

​Limitations