Skip to main content
Every build fingerprints models, seeds, functions, and Python nodes, tracks source freshness, and skips anything that hasn’t changed, including audits that already passed for the same version. The same change-aware planning extends to an existing dbt project: SQLBuild prunes unchanged dbt models from the dbt run and can reuse already-built tables from production. It also adds ingestion, Python nodes, and opt-in virtual environments for more advanced use cases.

Supported adapters

SQLBuild works with DuckDB, MotherDuck, Snowflake, BigQuery, Databricks, PostgreSQL, and SQL Server today. Support for ClickHouse, Redshift, Trino, Spark, and Athena is on the way. DuckDB runs entirely locally, so you can try SQLBuild and run full E2E tests without any warehouse credentials. Head to the Quickstart to get a project running in minutes, or see Adapters for connection setup.

How it works

  1. Define your models as SQL files with MODEL() headers that declare configuration, schema, and audits inline
  2. Compile to resolve references, validate SQL, infer column types, check contracts, and compute column lineage - all offline
  3. Plan what needs to change by comparing fingerprints, source freshness, seed content, and Python node identities against the warehouse state. Unchanged models, seeds, audits, and Python nodes are skipped. Production relations can optionally be reused when version identities match.
  4. Build by executing the plan: materializing only what changed, validating data before promotion, and ensuring bad data never reaches production
  5. Test with chained unit tests, E2E scenario tests, and local replay through DuckDB - no warehouse required

Why SQLBuild?

Change-aware builds

Every node in the graph has a versioned identity. SQLBuild compares identities against warehouse state on every build and only runs what has actually changed.
  • Models and functions: Fingerprinted by query hash, config, and upstream function hashes. Unchanged models are skipped entirely.
  • Seeds: Content and load-affecting config are hashed. Unchanged seeds are not reloaded.
  • Python nodes: Loaders, tasks, assets, checks, and hooks are fingerprinted by source code hash and dependency hashes (scoped to the git root). The plan shows source and dependency diffs when a node’s identity changes. Skip/run decisions are user-controlled via ctx.skip(), since only the node’s own logic knows whether its external inputs have changed.
  • Audits: Audits that already passed for the same model version are not re-run. A version identity change triggers re-validation.
  • Source freshness: External source data versions are tracked automatically (via adapter metadata, column queries, or custom SQL). Models downstream of unchanged sources are skipped. Lag tolerance prevents jitter from triggering unnecessary rebuilds.
  • Cascade propagation: When a model does change, the signal propagates downstream through the DAG. Upstream changes cascade with configurable replay windows (replay_on_change).
  • Reuse from production: Dev targets can opt into reuse_from to clone or copy relations from another target (e.g. prod) when version identities match, instead of rebuilding from scratch.

Warehouse-native state

  • Standard mode: all state lives in the warehouse as append-only tables (_sqlbuild_fingerprints, _sqlbuild_source_freshness, _sqlbuild_node_results) in the same schemas as your data. No external state database, no manifest files, no state machine that can corrupt. The planner reads the latest row per identity, compares it against the compiled project, and writes new rows after successful builds. Old rows are retained as history and can be pruned by the janitor.
  • Virtual environments: the same identity and freshness data is stored in the VDE state backend (PostgreSQL or DuckDB), scoped per environment.

Familiar SQL models

Models are SQL files with a MODEL() header and a SELECT. References to other models and sources use __ref() and __source(), and configuration, schema, and audits are declared inline in the header. If you know dbt or SQLMesh, you already know the shape.
-- models/marts/fact_orders.sql
MODEL (
  materialized table,
  columns (
    order_id (audits [not_null, unique]),
  ),
);

SELECT
  o.order_id,
  o.customer_id,
  p.amount_cents AS total_cents,
  p.method AS payment_method
FROM __ref("stg_orders") o
JOIN __ref("stg_payments") p USING (order_id)

Python macros, not Jinja

  • Real Python functions: Macros are plain Python, called with @macro() in SQL. Testable, debuggable, and composable with standard tooling - no Jinja, no string-templating language to wrangle.
# macros/grant_target.py
def grant_target(target):
    return f"GRANT SELECT ON {target} TO analyst_role"

Audits that block bad data

  • Full table builds: SQLBuild materializes into a staging table and runs error-severity audits before promotion. If any fail, the swap is blocked and the production table is untouched.
  • Incremental models: Delta-phase audits validate each batch before DML is applied. Bad data is caught before it reaches the target.

Incremental processing

  • Cursor-based replay: SQLBuild resumes by reading the highest timestamp or integer value already in the target table. If a model fails for several runs, the next successful build picks up from the last data it actually wrote, with no manual backfilling.
  • Microbatch mode: Split large replay windows into configurable batches, each with its own audit cycle. Or process the full range in one pass, the choice is per-model.
  • Replay on change: When a model’s version identity changes, replay_on_change controls how much data to reprocess: forward (default, just run the next delta), bounded-14d (replay a window), or full (rebuild the table).

Python you can read, Rust where it counts

  • The framework is Python. Adapters, macros, hooks, providers, custom materializations, and Python nodes are all plain Python you can read and extend.
  • SQL analysis is Rust. For SQL parsing, validation, column inference, lineage, and transpilation, SQLBuild uses Polyglot, a Rust reimplementation of SQLGlot’s SQL analysis capabilities (MIT, 32+ dialects). This keeps compile fast even on large projects while the code you actually work with stays Python.

SQL unit tests that scale

  • Chain across models: Mock your sources, assert on the model you care about, and SQLBuild automatically resolves every intermediate model using its real SQL. One test file can be a full integration test across your entire pipeline.
  • Macro-powered mocks: Because unit tests are written in SQL, they support macro calls, so you can write reusable mock generators and fixture builders instead of copy-pasting mock inputs across test files.
-- Mock two sources, assert on the final mart.
-- stg_orders and stg_payments resolve automatically from their real SQL.
TEST();

WITH
__source__raw__orders AS (
  @mock_orders()
),
__source__raw__payments AS (
  SELECT 1 AS payment_id, 1 AS order_id, 1500 AS amount_cents, 'credit_card' AS method
),
__expected__fact_orders AS (
  SELECT 1 AS order_id, 100 AS customer_id, 1500 AS total_cents,
         'credit_card' AS payment_method
)
SELECT 1

E2E scenario tests with local replay

  • Real graph execution: Define coherent fixture inputs, and SQLBuild builds the real model graph against them in isolated warehouse relations. Test end-to-end business logic across your entire pipeline, not just one model at a time.
  • Local replay without a warehouse: Capture scenario fixtures as JSONL snapshots, then replay them locally through DuckDB. CI pipelines run full E2E tests with zero warehouse credentials or compute cost.
  • Zero-row assertions: Write property checks that pass when no rows violate a condition. Useful for invariants like “no negative revenue” or “no duplicate customer IDs” alongside full expected-output comparisons.
SCENARIO (
  description: "Daily revenue includes only successful payments",
  tags: ["revenue"]
);

WITH
__ref__stg_orders AS (
  SELECT 1 AS order_id, 10 AS customer_id, 'completed' AS status
),
__ref__stg_payments AS (
  SELECT 1 AS payment_id, 1 AS order_id, 1700 AS amount_cents, 'success' AS payment_status
),
__expected__daily_revenue AS (
  SELECT CAST('2026-04-01' AS DATE) AS revenue_date, 1700 AS total_revenue_cents
),
__assert__all_orders_have_payments AS (
  SELECT * FROM __ref("fact_orders") WHERE payment_amount_cents IS NULL
)
SELECT 1

Python nodes

Grow beyond warehouse-only SQL without leaving the graph. Python nodes are ordinary functions, decorated to become first-class nodes in the same DAG as your SQL models, and they run as part of sqb build. There are four kinds:
  • Loaders (@loader) load external data into managed sources, with incremental write strategies (table, append, delete_insert, merge), cursor-based loading, and concurrent execution.
  • Tasks (@task) run Python computation or side effects as graph nodes.
  • Assets (@asset) produce or observe external artifacts, with optional columns and lineage.
  • Checks (@check) validate tasks, assets, and loaders, and run during sqb build or on their own with sqb check.
from sqlbuild.loaders import loader
from sqlbuild.executor.load.models import LoaderContext

@loader
def raw_orders(ctx: LoaderContext):
    if ctx.current_cursor_value is None:
        return fetch_all_orders()
    return fetch_orders_since(ctx.current_cursor_value)
SQL models never depend on Python nodes - the only path from Python into SQL is a loader populating a source - so the SQL graph stays fully analyzable on its own. Nodes can also be generated programmatically with @factory. See Python Nodes.

Multi-target workflows

  • Data diffs: Compare schemas and row-level data between targets (or virtual environments) with sqb diff prod:dev.
SQLBuild diff output showing schema comparison, row-level differences, and changed column values between prod and dev targets
  • Zero-copy cloning: Branch targets instantly with sqb clone without duplicating data.
  • Deferred references: Compile and plan against a production target with --defer-to while building in dev.
  • No manifest required: Clone, diff, and defer work directly against live targets. No manifest.json generation, no artifact management, no stale state.

Virtual environments

  • Opt-in, not forced: Virtual environments give you instant low-copy branching, promotion, and rollback - but they are an opt-in mode, not a tax you pay upfront. Standard mode stays the default, so the floor stays low and you reach for virtual environments only when a workflow actually needs them. See Virtual Environments.

Extensibility

  • User-defined functions: SQL and Python UDFs managed as part of your project. Functions participate in the DAG - definition changes trigger rebuilds of dependent models. Table functions provide predicate-pushdown-friendly alternatives to final-layer views.
  • Custom materializations: Write materialization logic in Python with full framework integration - including audit hooks, schema change signals, and query change detection.
def materialize(ctx: MaterializationContext) -> MaterializationResult:
    stale = find_untracked_partitions(ctx)

    for partition in stale:
        ctx.adapter.create_table_as(ctx.connection, target=staging, sql=partition_sql)
        ctx.run_audits(staging)  # same audit guarantees as built-in types
        ctx.execute_sql(f"INSERT INTO {ctx.destination} SELECT * FROM {staging}")

    return MaterializationResult(relation=ctx.destination)
  • Path-between selectors: --select fact_orders~daily_activity_rollup selects every model on the shortest path between two nodes, with optional upstream/downstream expansion.

What’s next

  • Broader adapter support - ClickHouse, Redshift, Trino, Spark, Athena

Quickstart

Get a project running locally in minutes.

CLI Reference

Full reference for every SQLBuild command.

Concepts

Understand models, incremental strategies, audits, and selectors.

Scenarios

E2E tests with local DuckDB replay.