Documentation Index
Fetch the complete documentation index at: https://docs.sqlbuild.com/llms.txt
Use this file to discover all available pages before exploring further.
Why column lineage matters
Impact analysis - Before changing a source column, see exactly which downstream models and columns are affected. A rename or type change inraw_orders.id can be traced through every model that consumes it, even indirectly.
Debugging data issues - When a column has unexpected values, trace it upstream to find where the data originates and what transformations it passes through. Instead of reading SQL files and mentally joining dependencies, ask SQLBuild to show the path.
Documentation - Column lineage provides machine-readable metadata about your pipeline. The JSON output can feed data catalogs, governance tools, or custom dashboards.
How it works
SQLBuild analyzes column lineage statically at compile time using SQLGlot. No warehouse connection is needed. The analyzer parses each model’s SQL, resolvesref() and source() calls, and traces columns through SELECT lists, CTEs, JOINs, subqueries, and expressions.
Column lineage requires SQLGlot to be enabled in project settings (it is by default).
Transform types
Each lineage edge is classified by how the column is transformed:| Transform | Description | Example |
|---|---|---|
direct | Column passes through unchanged | SELECT order_id FROM ... |
cast | Column is explicitly cast to a different type | SELECT CAST(id AS BIGINT) |
expression | Column is used in a computed expression | SELECT amount * 100 AS amount_cents |
aggregation | Column is used inside an aggregate function | SELECT SUM(amount) AS total |
star | Column is included via SELECT * | SELECT * FROM ... |
constant | Output column is a literal value with no upstream dependency | SELECT 'active' AS status |
direct edge means the column is a simple passthrough - safe to rename if you rename the source. An expression or aggregation edge means the column is computed - the upstream value is an input to a calculation, not a 1:1 mapping.
Confidence levels
Each edge also carries a confidence level indicating how certain the analyzer is about the traced dependency:| Confidence | Meaning |
|---|---|
high | The lineage path is fully resolved through known SQL constructs |
medium | The path is likely correct but involves constructs the analyzer handles with heuristics |
low | The path is best-effort - complex SQL patterns or unsupported constructs may reduce accuracy |
Analysis modes
Column lineage supports two analysis modes that trade off speed against depth of analysis. Rich mode uses SQLGlot’s lineage module and optimizer to resolve columns through CTEs, subqueries, and multi-level nesting with full transform classification. Thorough, but slower because the optimizer runs per column per model. Fast mode parses the SQL AST directly to extract column mappings, resolve CTE references, and classify transforms. It handles the same SQL patterns that most column lineage tools support and is fast enough to run on every compile.sqb compile defaults to fast mode because it runs frequently and analyzes the entire project. sqb lineage defaults to rich mode because it targets a specific column in a scoped slice of the DAG, where the deeper analysis is worth the cost. Both are overridable:
Using column lineage
Interactive tracing with sqb lineage
Trace a specific column upstream or downstream:
model_name.column_name. Column lineage supports upstream and downstream directions (not both - model lineage supports both).
See the lineage CLI reference for full flag documentation and output format examples.
Batch analysis with sqb compile
Every compile run includes column lineage in the output:
lineage field with column_count, edge_count, and has_star metadata.
See the compile CLI reference for details on the compile report format.
Integration with contract validation
Column lineage feeds into compile-time contract validation. When a model declares columns in itsMODEL() header, the compiler uses inferred column information to check that:
- Every declared column exists in the query output
- Column types match the declared types (when
type_enforcementis enabled)
sqb compile and report diagnostics with source-annotated error messages.
Limitations
- Column lineage requires SQLGlot to be enabled (
sqlglot = truein settings, which is the default) - Complex SQL patterns (deeply nested correlated subqueries, dynamic SQL, adapter-specific functions) may reduce accuracy or confidence
SELECT *is tracked as astartransform - the analyzer knows the column passes through but the mapping is less precise than explicit column references- Column lineage is computed statically from SQL text. Runtime-only column additions (e.g. from dynamic UDFs) are not tracked

