Skip to main content
When you run sqb plan or sqb build, SQLBuild compiles your project, compares it against the current warehouse state, and produces a plan. By default, only stale work runs. Unchanged models, seeds, audits, and Python nodes are skipped automatically. Use --force to override change detection and run everything selected, regardless of state.

What is tracked

Every node in the graph has a versioned identity stored in _sqlbuild_fingerprints in the target schema. The planner reads these on every run and compares them against the compiled project.

Models and functions

Each model and function has a fingerprint derived from:
  • Query hash - the normalized SQL after macro expansion and reference resolution.
  • Config hash - version-identity config values (materialization settings, contracts, hooks, custom config/placeholders).
  • Function hashes - for models that depend on user-defined functions, the function’s own fingerprint is included. A function change cascades to all dependent models.

Seeds

Seeds are fingerprinted by content hash and load-affecting config. Unchanged seeds are not reloaded.

Python nodes

Loaders, tasks, assets, checks, and hooks are fingerprinted by:
  • Source code hash - the normalized source of the decorated function.
  • Dependency hashes - transitive hashes of imported project modules, scoped to the git root. Changes to third-party packages (in .venv, site-packages, etc.) do not affect the identity; changes to your project’s own helper modules do.
  • Decorator config hash - the arguments passed to the decorator.
Python identity tracking is primarily a visual indicator in the plan output. When a node’s identity changes, the plan shows source and dependency diffs so you can see exactly what changed. However, unlike SQL models where SQLBuild can observe inputs (fingerprints, source freshness) and skip unchanged work automatically, Python nodes may depend on external inputs that the framework cannot observe: an API, a file on disk, a third-party service. For this reason, skip/run decisions for Python nodes are user-controlled via ctx.skip(). The node’s own logic decides whether it needs to run based on whatever semantics are appropriate: checking an API timestamp, comparing a file hash, querying a metadata table. If the condition is met, ctx.skip() short-circuits the node (and optionally its downstream dependents via soft or hard skip mode). This keeps the mechanism flexible and powerful without requiring the framework to infer things it cannot know.

Audits

Audits that already passed for the same model version identity are not re-run. When a model’s version changes, its audits are re-validated. This avoids redundant audit work on unchanged models while ensuring changed models are always fully validated.

Change reasons

The plan assigns a reason to each node that needs work:
ReasonMeaning
First runNo fingerprint exists in the target schema
Query changedThe model’s query SQL differs from the stored fingerprint
Config changedVersion-identity config values differ
Schema changedUpstream schema changes detected (column additions, removals, type changes)
Upstream changedAn upstream model’s change cascades downstream
Run despite unchangedThe model is configured to run periodically even without changes (see Run despite unchanged)
Unchanged nodes are skipped and show as Normal in the plan output.

Source freshness

Source freshness lets SQLBuild observe whether external source data has actually changed between runs. Models downstream of unchanged sources are skipped automatically.

Configuration

Source freshness is configured per source in sources/*.yml with a freshness: block. See Sources: Source freshness for the full configuration reference.

How observations work

During planning, SQLBuild observes the current data version of each source that has freshness configured (or that the adapter can observe automatically):
  1. Observe - query the source’s current data version using the configured strategy.
  2. Compare - compare the observed version against the last recorded observation from _sqlbuild_source_freshness in the target schema.
  3. Propagate - walk the DAG downstream from changed or unknown sources to identify which models are affected.
Sources without explicit freshness: config are auto-observed using the adapter strategy if the adapter supports table metadata and the source has a physical table (not an expression source, not a managed source).

Lag tolerance

For timestamp-based freshness, lag_tolerance controls how much the observed value can drift before being considered a real change. If the current timestamp is within the tolerance of the previous observation, the source is treated as unchanged. This is useful for sources where the freshness timestamp moves by seconds or minutes on every query but the underlying data hasn’t meaningfully changed.

State storage

Source freshness observations are stored in _sqlbuild_source_freshness in each target schema. Records are appended only after the affected downstream models build successfully. If a build fails, the previous observation is preserved so the next run still sees the source as changed. Observations are resolved across all target schemas in the project, so a source referenced by models in different schemas is tracked consistently. Use sqb freshness to observe source freshness on demand without triggering a build.

Cascade propagation

When a model or function changes, the change signal propagates downstream through the DAG. Every model downstream of a changed node is marked Upstream changed in the plan, even if its own SQL and config are identical. The cascade walk is topological: it processes models in dependency order, so each model sees the resolved state of all its upstreams before deciding its own effective action. What cascades:
  • A query, config, or schema change on any model cascades to all its downstream dependents.
  • A function change cascades to every model that calls it (directly or transitively).
  • Source freshness changes propagate downstream the same way.
How materialization types respond:
  • Views are recreated on every build regardless, so a cascade has no extra cost.
  • Tables are fully rebuilt, same as if they had changed themselves.
  • Incremental models receive a replay window from the cascade. A full replay always cascades. A bounded replay only cascades when the upstream and downstream models share the same cursor_type (e.g. both use timestamp), so unrelated cursor types don’t inherit bounded windows that don’t apply. The downstream model’s own replay_on_change policy takes precedence over any cascaded signal.
Resolution when multiple upstreams are stale: If a model has multiple stale upstreams with different replay actions, the most aggressive action wins. full beats bounded, and among bounded actions, the longer duration wins. Ties are broken alphabetically by model name for determinism. Overriding cascaded replay: Downstream incremental models can set their own replay_on_change policy to override the cascaded signal. If a downstream model declares its own policy, that policy applies instead of the upstream’s. If it has no policy, it inherits the upstream’s replay scope. See Incremental Models: Replay on change for configuration details.

Replay on change

When a change is detected on an incremental model, replay_on_change controls how much data to reprocess:
  • forward (default) - run the normal incremental delta from the cursor. No reprocessing.
  • bounded-<duration> (e.g. bounded-14d) - replay the specified window of data.
  • full - drop and rebuild the entire table.
See Incremental Models: Replay on change for configuration details.

Reuse from production

Dev targets can opt into reusing relations from another target (e.g. prod) instead of rebuilding from scratch. When a model’s version identity in dev matches the version already built in prod, SQLBuild clones or copies the relation rather than re-executing the query. Configure this on the target:
[targets.dev]
schema = "dev"
reuse_from = "prod"
The planner checks each model against the reuse_from target’s fingerprints and relation state. Models are reused when:
  • The expected version identity matches the reuse_from target’s built version.
  • The relation exists in the reuse_from target.
  • Source freshness in the reuse_from target is current.
For incremental models, reuse clones or copies the prod relation as a baseline, then runs the incremental delta on top. Use reuse_hard_copy = true on the target to force a full data copy instead of a zero-copy clone (useful when the adapter doesn’t support cloning or when you need an independent copy). reuse_from cannot be used together with defer_sources_to on the same target.

Run despite unchanged

Some models depend on external data that isn’t tracked by source freshness, for example a table model that reads from an API-populated staging area. run_despite_unchanged forces a model to run periodically even when its version identity hasn’t changed.
MODEL (
  materialized table,
  run_despite_unchanged "always",
);
MODEL (
  materialized table,
  run_despite_unchanged "24h",
);
  • always - run on every build regardless of state.
  • Duration (e.g. 24h, 30d, 90m) - run if at least the specified time has passed since the model’s upstream source freshness was last observed. Requires at least one upstream source with timestamp freshness tracking.
Only table materializations support run_despite_unchanged. When triggered, downstream models are also marked as stale.

Python node pruning

When unchanged SQL models are skipped, read-side Python nodes (tasks, assets, checks) that depend on those models are also skipped. Loaders always run regardless of pruning, since they populate sources that the SQL graph depends on. Python nodes also have their own identity fingerprints. If a node’s source code or dependencies change, it runs even if its SQL dependencies haven’t changed.

Warehouse-native state (standard mode)

In standard mode, all change-tracking state lives in the warehouse as append-only tables in the same schemas as your data:
  • _sqlbuild_fingerprints - stores version identities for models, functions, seeds, and Python nodes. One row per successful build per identity.
  • _sqlbuild_source_freshness - stores source freshness observations. One row per successful build per source identity.
  • _sqlbuild_node_results - stores Python node runtime results (payload, metadata, status, errors). One row per execution per node. Results persist across runs for observability and downstream consumption.
There is no external state database, no manifest files, and no state machine with transitions that can corrupt. The planner reads the latest row per identity, compares it against the compiled project, and writes new rows after successful builds. Old rows are retained as immutable history. State tables are read across all target schemas in the project, so fingerprints and freshness observations are resolved consistently regardless of which schema a model targets. Use sqb janitor to prune old state history rows while retaining the latest per identity.

Virtual environments

Virtual environments store identities and change-tracking state in the VDE state backend (PostgreSQL or DuckDB) rather than in warehouse fingerprint tables. Identities are scoped per virtual environment, so each environment tracks its own version hashes, source freshness observations, and Python node identities independently. Change detection uses version hash comparison and VDE state refs. See Virtual Environments: Building for details.