Supported adapters
SQLBuild works with DuckDB, MotherDuck, Snowflake, BigQuery, Databricks, PostgreSQL, and SQL Server today. Support for ClickHouse, Redshift, Trino, Spark, and Athena is on the way. DuckDB runs entirely locally, so you can try SQLBuild and run full E2E tests without any warehouse credentials. Head to the Quickstart to get a project running in minutes, or see Adapters for connection setup.How it works
- Define your models as SQL files with
MODEL()headers that declare configuration, schema, and audits inline - Compile to resolve references, validate SQL, infer column types, check contracts, and compute column lineage - all offline
- Plan what needs to change by comparing fingerprints, source freshness, seed content, and Python node identities against the warehouse state. Unchanged models, seeds, audits, and Python nodes are skipped. Production relations can optionally be reused when version identities match.
- Build by executing the plan: materializing only what changed, validating data before promotion, and ensuring bad data never reaches production
- Test with chained unit tests, E2E scenario tests, and local replay through DuckDB - no warehouse required
Why SQLBuild?
Change-aware builds
Every node in the graph has a versioned identity. SQLBuild compares identities against warehouse state on every build and only runs what has actually changed.- Models and functions: Fingerprinted by query hash, config, and upstream function hashes. Unchanged models are skipped entirely.
- Seeds: Content and load-affecting config are hashed. Unchanged seeds are not reloaded.
- Python nodes: Loaders, tasks, assets, checks, and hooks are fingerprinted by source code hash and dependency hashes (scoped to the git root). The plan shows source and dependency diffs when a node’s identity changes. Skip/run decisions are user-controlled via
ctx.skip(), since only the node’s own logic knows whether its external inputs have changed. - Audits: Audits that already passed for the same model version are not re-run. A version identity change triggers re-validation.
- Source freshness: External source data versions are tracked automatically (via adapter metadata, column queries, or custom SQL). Models downstream of unchanged sources are skipped. Lag tolerance prevents jitter from triggering unnecessary rebuilds.
- Cascade propagation: When a model does change, the signal propagates downstream through the DAG. Upstream changes cascade with configurable replay windows (
replay_on_change). - Reuse from production: Dev targets can opt into
reuse_fromto clone or copy relations from another target (e.g. prod) when version identities match, instead of rebuilding from scratch.
Warehouse-native state
- Standard mode: all state lives in the warehouse as append-only tables (
_sqlbuild_fingerprints,_sqlbuild_source_freshness,_sqlbuild_node_results) in the same schemas as your data. No external state database, no manifest files, no state machine that can corrupt. The planner reads the latest row per identity, compares it against the compiled project, and writes new rows after successful builds. Old rows are retained as history and can be pruned by the janitor. - Virtual environments: the same identity and freshness data is stored in the VDE state backend (PostgreSQL or DuckDB), scoped per environment.
Familiar SQL models
Models are SQL files with aMODEL() header and a SELECT. References to other models and sources use __ref() and __source(), and configuration, schema, and audits are declared inline in the header. If you know dbt or SQLMesh, you already know the shape.
Python macros, not Jinja
- Real Python functions: Macros are plain Python, called with
@macro()in SQL. Testable, debuggable, and composable with standard tooling - no Jinja, no string-templating language to wrangle.
Audits that block bad data
- Full table builds: SQLBuild materializes into a staging table and runs
error-severity audits before promotion. If any fail, the swap is blocked and the production table is untouched. - Incremental models: Delta-phase audits validate each batch before DML is applied. Bad data is caught before it reaches the target.
Incremental processing
- Cursor-based replay: SQLBuild resumes by reading the highest timestamp or integer value already in the target table. If a model fails for several runs, the next successful build picks up from the last data it actually wrote, with no manual backfilling.
- Microbatch mode: Split large replay windows into configurable batches, each with its own audit cycle. Or process the full range in one pass, the choice is per-model.
- Replay on change: When a model’s version identity changes,
replay_on_changecontrols how much data to reprocess:forward(default, just run the next delta),bounded-14d(replay a window), orfull(rebuild the table).
Python you can read, Rust where it counts
- The framework is Python. Adapters, macros, hooks, providers, custom materializations, and Python nodes are all plain Python you can read and extend.
- SQL analysis is Rust. For SQL parsing, validation, column inference, lineage, and transpilation, SQLBuild uses Polyglot, a Rust reimplementation of SQLGlot’s SQL analysis capabilities (MIT, 32+ dialects). This keeps compile fast even on large projects while the code you actually work with stays Python.
SQL unit tests that scale
- Chain across models: Mock your sources, assert on the model you care about, and SQLBuild automatically resolves every intermediate model using its real SQL. One test file can be a full integration test across your entire pipeline.
- Macro-powered mocks: Because unit tests are written in SQL, they support macro calls, so you can write reusable mock generators and fixture builders instead of copy-pasting mock inputs across test files.
E2E scenario tests with local replay
- Real graph execution: Define coherent fixture inputs, and SQLBuild builds the real model graph against them in isolated warehouse relations. Test end-to-end business logic across your entire pipeline, not just one model at a time.
- Local replay without a warehouse: Capture scenario fixtures as JSONL snapshots, then replay them locally through DuckDB. CI pipelines run full E2E tests with zero warehouse credentials or compute cost.
- Zero-row assertions: Write property checks that pass when no rows violate a condition. Useful for invariants like “no negative revenue” or “no duplicate customer IDs” alongside full expected-output comparisons.
Python nodes
Grow beyond warehouse-only SQL without leaving the graph. Python nodes are ordinary functions, decorated to become first-class nodes in the same DAG as your SQL models, and they run as part ofsqb build. There are four kinds:
- Loaders (
@loader) load external data into managed sources, with incremental write strategies (table, append, delete_insert, merge), cursor-based loading, and concurrent execution. - Tasks (
@task) run Python computation or side effects as graph nodes. - Assets (
@asset) produce or observe external artifacts, with optional columns and lineage. - Checks (
@check) validate tasks, assets, and loaders, and run duringsqb buildor on their own withsqb check.
@factory. See Python Nodes.
Multi-target workflows
- Data diffs: Compare schemas and row-level data between targets (or virtual environments) with
sqb diff prod:dev.

- Zero-copy cloning: Branch targets instantly with
sqb clonewithout duplicating data. - Deferred references: Compile and plan against a production target with
--defer-towhile building in dev. - No manifest required: Clone, diff, and defer work directly against live targets. No
manifest.jsongeneration, no artifact management, no stale state.
Virtual environments
- Opt-in, not forced: Virtual environments give you instant low-copy branching, promotion, and rollback - but they are an opt-in mode, not a tax you pay upfront. Standard mode stays the default, so the floor stays low and you reach for virtual environments only when a workflow actually needs them. See Virtual Environments.
Extensibility
- User-defined functions: SQL and Python UDFs managed as part of your project. Functions participate in the DAG - definition changes trigger rebuilds of dependent models. Table functions provide predicate-pushdown-friendly alternatives to final-layer views.
- Custom materializations: Write materialization logic in Python with full framework integration - including audit hooks, schema change signals, and query change detection.
- Path-between selectors:
--select fact_orders~daily_activity_rollupselects every model on the shortest path between two nodes, with optional upstream/downstream expansion.
What’s next
- Broader adapter support - ClickHouse, Redshift, Trino, Spark, Athena
Quick links
Quickstart
Get a project running locally in minutes.
CLI Reference
Full reference for every SQLBuild command.
Concepts
Understand models, incremental strategies, audits, and selectors.
Scenarios
E2E tests with local DuckDB replay.

