DuckDB in production: what it's actually good at (and what it isn't)

By Arshad Ansari

DuckDB is having a moment, and like every tool having a moment, the hype runs ahead of the nuance. "Just use DuckDB" is now a reflex answer to questions it doesn't actually fit. So let me give you the honest version, from someone who ships it.

What DuckDB is

DuckDB is an in-process analytical (OLAP) database — think "SQLite for analytics." It runs a columnar, vectorized query engine inside your application process. No server, no network hop, no cluster. You point it at Parquet, CSV, or its own format and run real SQL — window functions, joins, the works — at speeds that embarrass a round-trip to a cloud warehouse for the same-size data.

Where it shines in production

  • Transform-heavy batch jobs. Reading Parquet, joining, aggregating, and writing Parquet back out. DuckDB will saturate your cores and finish before a warehouse has authenticated your session.
  • Embedded analytics in an app or service. Ship query capability inside your service instead of calling out to a warehouse. Lower latency, no per-query bill, fewer moving parts.
  • Local and CI data work. Notebooks, ad-hoc analysis, and test fixtures that run identically on a laptop and in CI — no shared environment to provision or pollute.
  • The "interactive" layer over a lake. Parquet in object storage as the source of truth, DuckDB as the fast query engine over it. You get warehouse-ish ergonomics without the warehouse.

Where it does not belong

Being honest about the edges is what makes the tool trustworthy:

  • High-concurrency, multi-user serving. DuckDB is single-process. It is not a shared warehouse for fifty analysts hammering it at once. Put it behind a service, or use a warehouse.
  • Heavy concurrent writes / OLTP. It's an analytical engine, not a transactional database. Don't make it your application's primary write store.
  • Your company's governed source of truth. Multi-tenant access control, row-level governance, and "one warehouse everyone trusts" are warehouse problems. DuckDB is a compute engine, not a governance layer.

The pattern that works

The mistake is treating DuckDB as a replacement for your warehouse. The win is treating it as the right-sized compute engine for the large fraction of your workload that never needed a warehouse in the first place:

Truth lives in Parquet. DuckDB is the fast, cheap, serverless way to ask questions of it. Reach for the warehouse only for the workloads that genuinely earn it.

Done this way, DuckDB isn't a toy or a hype cycle — it's the least infrastructure that actually solves the problem. Which, in my experience, is almost always the right amount.


Free book: Local-First Analytics is the deep version of this — DuckDB, Parquet and Arrow in practice, with runnable code and real datasets. Grab it free here.

Trying to figure out where DuckDB fits in your stack without betting the company on it? Let's talk.

Building something data-heavy? Let's talk.