Letting Claude Code write your dbt models — and why you still read every line

dbt is the ideal problem for an AI coding agent. Your project is text files — SQL queries and YAML — with a CLI that tells you immediately when something breaks. That feedback loop is what agents are good at: write, run, read the error, fix, repeat. Claude Code can do this. It can read your whole dbt project, understand the DAG, generate models that follow your conventions, run the build, parse the failure, and iterate.

But "can do this" is not the same as "hands off." The agent is still a tool. You steer it. You review what it produces. You especially review the tests and the grain.

Why dbt fits Claude Code

Most SQL work doesn't have this texture. You write a query, it runs once, done. dbt is different. You're writing reusable, tested, documented SQL artifacts that sit in a DAG and may be run by schedulers, other people, or downstream systems. The CLI gives clear errors: column not found, data quality test failed, documentation missing. That's signal. Claude Code reads signals and responds.

Text also matters. dbt models are text. So are the YAML files that define tests, columns, and descriptions. There's no UI to navigate, no hidden state. The agent can see the entire project, understand structure, and reason about it.

The workflow looks like this: you point Claude Code at your dbt repo, give it the dbt agent skills (a set of prompts about dbt conventions), and ask it to build something specific — not "refactor the whole project," but "create staging models for the raw_orders table." Claude Code reads your existing models, learns your style, runs dbt build, reads errors, fixes them, and stops when the build passes.

The workflow in practice

Start by opening Claude Code in your dbt project directory. Create or update a CLAUDE.md file that documents your conventions:

Staging vs. marts (which schemas, which naming patterns)
How you write SQL (dbt-sql style, indentation, Common Table Expressions)
Test coverage expectations
Column description standards

Then install dbt agent skills. The exact command depends on your interface, but the flow is the same: the agent gains knowledge about dbt best practices and validation patterns.

Ask for the minimum. Don't say "build out the whole analytics layer." Say "create staging models for orders, customers, and line items. Use dbt conventions. Run the build and fix errors until it passes." The agent will generate SQL + YAML, run it, and iterate on failures. You watch the process, especially the first time, to make sure it's learning your style.

A concrete example

Here's what Claude Code might generate for a simple staging model. This is illustrative — the exact output depends on your data source and schema.

models/staging/stg_orders.sql:

{{ config(
    materialized = 'view'
) }}

with source_orders as (
    select * from {{ source('raw', 'orders') }}
),

deduped as (
    select
        *,
        row_number() over (partition by order_id order by updated_at desc) as rn
    from source_orders
)

select
    order_id,
    customer_id,
    order_date,
    total_amount,
    status,
    created_at,
    updated_at
from deduped
where rn = 1

models/staging/schema.yml:

models:
  - name: stg_orders
    description: >
      Deduped raw orders. One row per order_id.
    columns:
      - name: order_id
        description: >
          Primary key. Unique order identifier from the source system.
        tests:
          - unique
          - not_null
      - name: customer_id
        description: Foreign key to customers.
        tests:
          - not_null
      - name: order_date
        description: Date the order was placed.
        tests:
          - not_null
      - name: total_amount
        description: Order value in USD.
        tests:
          - not_null
          - dbt_expectations.expect_column_values_to_be_of_type:
              column_type: numeric
      - name: status
        description: Order status. One of pending, shipped, delivered, cancelled.
        tests:
          - accepted_values:
              values: ['pending', 'shipped', 'delivered', 'cancelled']
      - name: created_at
        description: Timestamp when the record was created in the source system.
      - name: updated_at
        description: Timestamp of the latest update.

This model dedupes raw data (keeping the latest version of each order), documents each column, and adds tests. You can now ask Claude Code to build on this: "Create a mart fct_orders that joins stg_orders with stg_customers and computes metrics. Run the build and fix issues."

Where the human is essential

The agent will not know your business grain. It will happily write a many-to-one join without knowing that one customer has many orders, creating fan-out. It will not know whether a NULL status is an error or expected. It will not know what your team calls the concept of "active customers" — whether that means "has an order in the last 90 days" or "has a contract." These things live in your head, in your schema docs, or in a design doc.

Review the SQL for grain. Does one row represent one order, one customer, one transaction? Does the join logic preserve that grain?

Review the tests for independence. A test that always passes teaches you nothing. The agent might write tests: [not_null] for a column that is never actually null, giving false confidence. You'll catch this by scanning the test names and asking yourself: would this test have caught a real bug?

And naming: the agent may call a column total_amt or order_value or amount_usd. None are wrong, but you want consistency. If your marts use amount_usd, staging should too.

Honest boundaries

Claude Code produces confident, wrong SQL. It will write a subquery that references a column from the wrong table. It will misunderstand the structure of your raw data. It hallucinates test syntax or applies tests that don't fit the column type. These are not rare edge cases. They happen often enough that you don't skip reading the output.

It also doesn't know scope. The agent will cheerfully generate forty models when you needed four. It optimizes for "the user asked for a feature" rather than "the user asked for the minimum." This is not a failing — it's a feature of having a tireless junior. But you have to manage it. Ask explicitly for the smallest set of models. Review what it suggests before asking it to build.

The tests especially need reading. A test that never fails is worse than no test because it gives false confidence that you caught a bug when you didn't.

Why this matters

Data quality is upstream of analytics, ML, and reporting. A bad model—one with wrong grain or a test that doesn't catch bugs—affects downstream systems and decisions. The agent is not responsible for these questions. You are. Claude Code makes it cheap and fast to generate models. That's a win. But cheap generation means you have to be more disciplined, not less, about review.

Use the agent to handle the repetitive parts: writing the SQL templates, running the build, fixing syntax errors, re-running tests. Keep the thinking parts: what does this row represent, does this join preserve grain, does this test actually validate what matters.

If you're writing dbt at scale, check out our other essays on building data pipelines that don't page you and connecting an AI assistant to your database with MCP.

Have questions about data engineering or AI-assisted development? Get in touch at /contact-us or grab our free book on local-first analytics.