Model Selection For dbt CLI
When working on Dbt projects you need to ensure that the CLI commands used to run or test models, seeds and snapshots encompass only the resource (or a subset) of interest. In other words, you need to be able to target specific models, tests, seeds or snapshots in order to avoid wasting resources and money. This is even more important when you work with fairly big models that process large volumes of data.
By default, dbt run|test|seed|snapshot
will execute all corresponding nodes in the dependency graph (i.e. dbt run
will run all models, dbt tests
will run all model tests, and so on). In this article, we will present all the possible model selection shorthands you can take advantage when running or testing models, seeds or snapshots via dbt Command Line Interface (CLI).
If you are looking into experimenting with the commands that we will present in the next few sections, feel free to create an example dbt project locally. You can do so (in probably less than two minutes) by following this step-by-step guide where you can also find a containerised dbt environment.
Run all resources in a dbt project
In order to select all resources within a dbt project, all you need to do is select the project name:
# Runs all models in project my_dbt_project
dbt run --select my_dbt_project
# Runs all tests in project my_dbt_project
dbt test --select my_dbt_project
# Runs all snapshots in project my_dbt_project
dbt snapshot --select my_dbt_project
# Runs all seeds in project my_dbt_project
dbt seed --select my_dbt_project
Select a specific resource
In order to execute run
, test
, snapshot
or seed
command for a specific model, all you need to do is specify the model name in the --select
option:
# Run model with name `my_model`
dbt run --select my_model
# Run test with id `not_null_orders_order_id`
dbt test --select not_null_orders_order_id
# Run snapshot with name `my_snapshot`
dbt snapshot --select my_snapshot
# Run seed with name `my_seed`
dbt seed --select my_seed
You can even run a specific model, seed or snapshot by its specific path that points to the Sql file that defines it:
# Run model my_model
dbt run --select path/to/my_model.sql
# Run snapshot my_snapshot
dbt snapshot --select path/to/my_snapshot.sql
# Run seed my_seed
dbt seed --select path/to/my_seed.sql
Select multiple models
--select
accepts multiple arguments which means that it is capable of running multiple models (or tests, snapshots and seeds) at the same time. To do so, simply provide all mode, test, snapshot or seed names when running the command:
# Run multiple models
dbt run --select my_model another_model
# Run multiple tests
dbr test --select not_null_orders_order_id unique_orders_order_id
# Run multiple snapshots
dbt snapshot --select my_snapshot another_snapshot
# Run multiple seeds
dbt seed --select my_seed another_seed
Select node and downstream dependencies
In order to run a dbt node as well as its downstream dependencies, you will need to specify the +
operator after the resource name.
# Run the model with name `my_model` as well as its downstream dependencies
dbt run --select my_model+
# Run my_model tests and the tests of its downstream dependencies
dbt test --select my_model+
# Run seed my_seed and its downstream dependencies
dbt seed --select my_seed+
Select model and upstream dependencies
Likewise, to select a node and its upstream dependencies, the +
operator needs to be specified prior to the node name:
# Run the upstream dependencies of model `my_model` and the model itself
dbt run --select +my_model
# Run the tests of my_model and the tests of its upstream dependencies
dbt test --select +my_model
# Run the upstream dependencies of snapshot my_snapshot and the snapshot itself
dbt snapshot --select +my_snapshot
# Run the upstream dependencies of seed my_seed and the seed itself
dbt seed --select +my_seed
Select model with downstream and upstream dependencies
Now in order to run a model as well as all of its downstream and upstream dependencies, you just need to specify the model name in-between two +
operators:
# Run the model `my_model` including its parents and children nodes
dbt run --select +my_model+
# Run the tests for model `my_model` including the tests of its parents and children
dbt test --select +my_model+
# Run the snapshot `my_snapshot` and all downstream and upstream dependencies
dbt snapshot --select +my_snapshot+
# Run the seed `my_seed` and all of the downstream and upstream depdencies
dbt seed --select +my_seed+
Select model and N downstream dependencies
There's a chance that instead of running all the downstream (children) dependencies of a model, you may have to run only a number of edges to step through. This can be achieved once again using the +
operator, but this time specifying the degree/level of parent models to execute.
# Run model my_model and its first-degree children
dbt run --select my_model+1
# Run tests for `my_model` model and the tests of its first-degree children
dbt test --select my_model+1
# Run `my_snapshot` snapshot and its first-degree children
dbt snapshot --select my_snapshot+1
# Run seed `my_seed` and its first-degree children
dbt seed --select my_seed+1
Select model and N upstream dependencies
In the same way, you can specify the number of edges to step through when it comes to upstream (or parent) dependencies
# Run my_model and its first and second degree parent nodes
dbt run --select 2+my_model
# Run tests of my_model and the tests of its first and second degree parents
dbt test --select 2+my_model
# Run snapshot my_snapshot and its first and second degree parent nodes
dbt snapshot --select 2+my_snapshot
# Run seed my_seed and its first and second degree parent nodes
dbt seed --select 2+my_seed
Select model and N upstream and M downstream dependencies
Finally, to select a model as well as N parent and M children nodes, you can specify the model in between the number of edges to step through for both upstream and downstream dependencies:
# Run model `my_model`, its parents up to the 4th level and its downstreams up to the 5th level
dbt run --select 4+my_model+5
# Run tests of model `my_model` and the tests of its parents up to the 4th level and its downstreams up to the 5th level
dbt test --select 4+my_model+5
# Run snapshot `my_snapshot`, its parents up to the 4th level and its downstreams up to the 5th level
dbt snapshot --select 4+my_snapshot+5
# Run seed `my_seed`, its parents up to the 4th level and its downstreams up to the 5th level
dbt seed --select 4+my_seed+5
Exclude a model
Apart from --select
, the dbt CLI also offers the --exclude
flag (with the same semantics as --select
). Any model specified in the --exclude
argument will be removed from the set of models selected with --select
.
The following command, will run all models except the one called my_model
:
dbt run --exclude my_model
The --exclude
argument is also applicable to other dbt commands:
# Run all tests except the one with id `not_null_orders_order_id`
dbt test --exclude not_null_orders_order_id
# Run all tests except the tests of customers model
dbt test --exclude customers
# Run all snapshots except `my_snapshot`
dbt snapshot --exclude my_snapshot
# Run all seeds except `my_seed`
dbt seed --exclude my_seed
Note that both --select
and --exclude
arguments can be combined in a single dbt command.
For example, the following command will run all models in package my_package
except the model user_base_model
and its downstream dependencies.
dbt run --select my_package --exclude my_package.user_base_model+
Run a model in a specific package
To run a model, test, snapshot or seed that belongs to a specific dbt package, you need to follow the dot notation as illustrated in the following command:
# Runs model my_model in package mypackage
dbt run --select mypackage.my_model
# Runs tests of my_model model in package mypackage
dbt test --select mypackage.my_model
# Runs snapshot my_snapshot in package mypackage
dbt snapshot --select mypackage.my_snapshot
# Runs seed my_seed in package mypackage
dbt seed --select mypackage.my_seed
Run all models in a specific path
In order to run models, tests, snapshots or seeds placed under a specific directory, you can use the following selector notation:
# Runs all models under path.to.my.models directory
dbt run --select path.to.my.models
# Runs all tests under path.to.my.models directory
dbt test --select path.to.my.models
# Runs all snapshots under path.to.my.snapshots directory
dbt snapshot --select path.to.my.snapshots
# Runs all seeds under path.to.my.seeds directory
dbt seed --select path.to.my.seeds
In addition to the dot notation, you can also run models in a specific path as illustrated below:
# Runs all models under path/to/my/models directory
dbt run --select path/to/my/models
# Runs all tests under path/to/my/models directory
dbt test --select path/to/my/models
# Runs all snapshots under path/to/my/snapshots directory
dbt snapshot --select path/to/my/snapshots
# Runs all seeds under path/to/my/seeds directory
dbt seed --select path/to/my/seeds
Select model with a specific tag
If you have tagged resources and you would like to execute all of them, you can provide the tag
selector followed by the tag name, as illustrated in the following command:
# Run all models with "finance" tag
dbt run --select tag:finance
Combining multiple selectors
Note that you can actually combine pretty much any selector described in this tutorial in a single command. For example, the following command will run every resource tagged with the finance
tag, the individual model my_model
as well as all the models in the path path.to.my.marketing.models
:
dbt run --select tag:finance my_model path.to.my.marketing.models
And as usual, this can be applied to pretty much every resource, including tests, seeds and snapshots:
# Tests
dbt test --select tag:finance not_null_orders_order_id path.to.my.marketing.models
# Seeds
dbt seed --select tag:finance my_seed path.to.my.marketing.seeds
# Snapshots
dbt snapshot --select tag:finance my_snapshot path.to.my.marketing.snapshots
Final Thoughts
In conclusion, when working with dbt projects, it is important to be able to target specific models, tests, seeds, or snapshots in order to avoid wasting resources and money. The dbt Command Line Interface (CLI) offers a variety of shorthands that allow you to select specific resources to run, test, seed, or snapshot. These include the ability to include or exclude such models when running any dbt command.
Become a member and read every story on Medium. Your membership fee directly supports me and other writers you read. You'll also get full access to every story on Medium.
Related articles you may also like