Why I wrote Local-First Analytics

I spent fifteen years building data platforms for financial-services companies, and somewhere along the way I noticed something uncomfortable: a huge fraction of the analytics infrastructure I was building — and being asked to build — existed to solve problems that fit on a laptop.

That observation became a book: Local-First Analytics. This is the short version of the argument.

The default stack is heavier than the problem

The reflexive modern data stack looks like this: ingest into a cloud warehouse, model it in the warehouse, query the warehouse from a BI tool, and pay for compute every time someone moves a slider on a dashboard. It's a fine architecture when you genuinely have warehouse-scale data and warehouse-scale concurrency.

But most teams don't. They have a few gigabytes — sometimes a few hundred megabytes — and a handful of analysts. For that workload, the cloud round-trip is pure overhead: latency on every query, a bill that scales with curiosity, and a pile of infrastructure to secure and operate. You are renting a freight train to carry a backpack.

What "local-first" means

Local-first analytics flips the default. Instead of shipping the question to where the data lives, you ship a copy of the data to where the question is asked — the analyst's machine, the application server, even the browser — and compute there.

This is now practical because of a quiet revolution in in-process analytical engines. DuckDB runs a columnar, vectorized query engine inside your process with no server to manage. Parquet gives you a compact, columnar file format that's trivial to move and cache. Between them, a dataset that used to "require" a warehouse now opens instantly in a notebook, a CLI, or a WebAssembly sandbox in a tab.

Why it's often the better engineering choice

Speed. No network hop. A query against a local Parquet file returns before a warehouse has finished authenticating your session.
Cost. Compute happens on hardware you already pay for. No per-query meter.
Simplicity. Fewer moving parts means fewer things to secure, monitor, and wake up for at 3am.
Privacy. Data that never leaves the device can't leak from a service you forgot to lock down.

When it's not the answer

Local-first isn't a religion. Genuinely large data, high write concurrency, multi-tenant governance, and "single source of truth" reporting still want a real warehouse. The point of the book isn't never use a warehouse — it's stop reaching for one by reflex. Match the weight of the architecture to the weight of the problem.

That's the same instinct I bring to consulting work: the best data infrastructure is the least infrastructure that actually solves the problem.

The book goes much deeper — DuckDB, Parquet and Arrow in practice, with runnable code and real datasets. Grab it free here. And if your stack feels heavier than your data, let's talk.

The default stack is heavier than the problem

What "local-first" means

Why it's often the better engineering choice

When it's not the answer

Building something data-heavy?