Continuous Integration and Automated Build Testing with dbtCloud
If you read my recent blog post on how Rittman Analytics built our operational analytics platform running on Looker, Stitch, Google BigQuery and dbt, you’ll know that we built-out the underlying data infrastructure using a modular Extract, Transform and Load (ELT) design pattern, like this:
We version-control our dbt development environment using git and Github.com, and do all new development beyond simple bug fixes as git feature branches, giving us a development process for dbt that looked like this:
1. We’d start by cloning dbt git repo master branch from github.com to the developer’s workstation, which also would have dbt installed locally along with the Google Cloud SDK so that they can connect to our development BigQuery dataset.
2. Create a new, local git branch for the new feature using the git CLI or a tool such as Github desktop
3. Develop the new feature locally using the developer’s install of dbt, committing any changes to the feature branch in that developer’s local git repo after checking that all dbt tests have run successfully.
locals-imac:ra_dw markrittman$ dbt run --models harvest_time_entries harvest_invoices --target dev Running with dbt=0.13.1 Found 50 models, 8 tests, 0 archives, 0 analyses, 109 macros, 0 operations, 0 seed files, 35 sources 20:53:11 | Concurrency: 1 threads (target='dev') 20:53:11 | 20:53:11 | 1 of 2 START table model ra_data_warehouse_dbt_dev.harvest_invoices.. [RUN] 20:53:13 | 1 of 2 OK created table model ra_data_warehouse_dbt_dev.harvest_invoices [OK in 1.80s] 20:53:13 | 2 of 2 START table model ra_data_warehouse_dbt_dev.harvest_time_entries [RUN] 20:53:15 | 2 of 2 OK created table model ra_data_warehouse_dbt_dev.harvest_time_entries [OK in 1.70s] 20:53:15 | 20:53:15 | Finished running 2 table models in 5.26s. Completed successfully Done. PASS=2 ERROR=0 SKIP=0 TOTAL=2 locals-imac:ra_dw markrittman$ dbt test --models harvest_time_entries harvest_invoices --target dev Running with dbt=0.13.1 Found 50 models, 8 tests, 0 archives, 0 analyses, 109 macros, 0 operations, 0 seed files, 35 sources 20:53:37 | Concurrency: 1 threads (target='dev') 20:53:37 | 20:53:37 | 1 of 4 START test not_null_harvest_invoices_id....................... [RUN] 20:53:39 | 1 of 4 PASS not_null_harvest_invoices_id............................. [PASS in 1.50s] 20:53:39 | 2 of 4 START test not_null_harvest_time_entries_id................... [RUN] 20:53:40 | 2 of 4 PASS not_null_harvest_time_entries_id......................... [PASS in 0.89s] 20:53:40 | 3 of 4 START test unique_harvest_invoices_id......................... [RUN] 20:53:41 | 3 of 4 PASS unique_harvest_invoices_id............................... [PASS in 1.08s] 20:53:41 | 4 of 4 START test unique_harvest_time_entries_id..................... [RUN] 20:53:42 | 4 of 4 PASS unique_harvest_time_entries_id........................... [PASS in 0.83s] 20:53:42 | 20:53:42 | Finished running 4 tests in 5.27s. Completed successfully
4. All dbt transformations at this stage are being deployed to our development BigQuery dataset.
5. When the feature was then ready for deployment to our production BigQuery dataset, the developer would then push the changes in their local branch to the remote git repo, creating that branch if it didn’t already exist.
6. Then they’d create a pull request using Github Desktop and the Github web UI summarising the changes and new features added by the development branch.
7. I’d then review the PR, try and work out if the changes were safe to merge into the master branch and then accept the pull reques, and then overnight our dbtCloud service would clone that development git repo master branch and attempt to deploy the new set of transformations to our production BigQuery dataset.
And sometimes, if for whatever reason that feature branch hadn’t been properly build-tested, that scheduled overnight deployment would then fail.
So having noticed the new Build on Pull Request feature that comes with paid versions of dbtCloud, we’ve upgraded from the free version to the $100/month basic paid version and added an automated “continuous integration” build test to our feature branch development process so that it now looks like this:
8. Now, when I come to review the pull request for this feature branch, there’s an automated test added by dbtCloud that checks to see whether this new version of my dbt package deploys without errors.
9. Looking at the details behind this automated build test, I can see that dbtCloud has created a dataset and runs through a full package deployment and test cycle, avoiding any risk of breaking our production dbt environment should that test deployment fail.
10. Checking back at the status of the test build in Github, I can see that this test completed with no issues, and I’m therefore safe to merge the changes in this feature branch into our development master branch, ready for dbtCloud to pick-up these changes overnight and deploy them as scheduled into production.
Docs on this recent dbtCloud feature are online here, and you’ll need at least the basic $100/month paid version of dbtCloud for this feature to become available in the web UI. It’s also dependent on Github as the git hosting service, so not available as an option if you’re using Gitlab, BitBucket or CodeCommit.