Your Snowflake bill is mostly overhead

By Arshad Ansari

Most teams I talk to who are alarmed by their Snowflake or BigQuery bill don't have a pricing problem. They have a fit problem. The warehouse is doing exactly what it's designed to do — it's just aimed at a workload that never needed it.

If you're paying thousands a month to query tens of gigabytes for a handful of analysts, this post is for you.

Where the money actually goes

Cloud warehouse bills are dominated by two things: compute that spins up for every query, and the human infrastructure around it. The headline number on the invoice is only half the cost. The other half is the engineer-time spent tuning warehouses, managing roles, watching credits, and explaining to finance why a dashboard refresh cost $40.

The uncomfortable truth: for small-to-medium data, most of that spend buys you nothing your workload requires. You're paying for elastic, multi-tenant, petabyte-scale concurrency — and running a few GB through it a few hundred times a day.

The three questions that tell you you're overpaying

  1. How big is your hot data, really? Not the raw archive — the slice people actually query. If it's under ~100GB, a warehouse is almost never the cheapest correct answer.
  2. How many concurrent queries do you genuinely have? Five analysts and a nightly job is not "warehouse-scale concurrency." It's a laptop.
  3. What fraction of your bill is dashboards re-querying the same data? If a slider move on a BI tool bills you every time, you're paying compute to answer the same question repeatedly.

If those answers are "small, few, and a lot," your bill is mostly overhead.

The cheaper path: bring the compute to the data

The alternative isn't "suffer with Postgres." It's a files-first stack that's genuinely faster for this size of problem:

  • Keep the truth in Parquet — compact, columnar, trivial to move and cache.
  • Query it with DuckDB — a vectorized, in-process engine with no server to run.
  • Move data between steps with Arrow — zero-copy, no serialization tax.

A dataset that "required" a warehouse opens instantly in a notebook, a CLI, or a job runner — on hardware you already pay for. No per-query meter. No 3am pager for a cluster that didn't autoscale.

When you should keep the warehouse

This isn't anti-warehouse zealotry. Genuinely large data, heavy write concurrency, multi-tenant governance, and a shared "single source of truth" for the whole company still want a real warehouse. The point isn't never — it's stop reaching for one by reflex. Match the weight of the architecture to the weight of the problem.

Most teams that do this audit find they can move 80% of their workload off the meter and keep the warehouse for the 20% that earns it.


Free book: the full playbook — DuckDB, Parquet, Arrow, with runnable code and real datasets — is in Local-First Analytics. Grab it free here.

If your warehouse bill feels heavier than your data, that's usually a fixable architecture problem. Let's talk — I do this for a living.

Building something data-heavy? Let's talk.