Why I wrote Local-First Analytics
By Arshad Ansari
I spent fifteen years building data platforms for financial-services companies, and somewhere along the way I noticed something uncomfortable: a huge fraction of the analytics infrastructure I was building — and being asked to build — existed to solve problems that fit on a laptop.
That observation became a book: Local-First Analytics. This is the short version of the argument.
The default stack is heavier than the problem
The reflexive modern data stack looks like this: ingest into a cloud warehouse, model it in the warehouse, query the warehouse from a BI tool, and pay for compute every time someone moves a slider on a dashboard. It's a fine architecture when you genuinely have warehouse-scale data and warehouse-scale concurrency.
But most teams don't. They have a few gigabytes — sometimes a few hundred megabytes — and a handful of analysts. For that workload, the cloud round-trip is pure overhead: latency on every query, a bill that scales with curiosity, and a pile of infrastructure to secure and operate. You are renting a freight train to carry a backpack.
What "local-first" means
Local-first analytics flips the default. Instead of shipping the question to where the data lives, you ship a copy of the data to where the question is asked — the analyst's machine, the application server, even the browser — and compute there.
This is now practical because of a quiet revolution in in-process analytical engines. DuckDB runs a columnar, vectorized query engine inside your process with no server to manage. Parquet gives you a compact, columnar file format that's trivial to move and cache. Between them, a dataset that used to "require" a warehouse now opens instantly in a notebook, a CLI, or a WebAssembly sandbox in a tab.
Why it's often the better engineering choice
- Speed. No network hop. A query against a local Parquet file returns before a warehouse has finished authenticating your session.
- Cost. Compute happens on hardware you already pay for. No per-query meter.
- Simplicity. Fewer moving parts means fewer things to secure, monitor, and wake up for at 3am.
- Privacy. Data that never leaves the device can't leak from a service you forgot to lock down.
When it's not the answer
Local-first isn't a religion. Genuinely large data, high write concurrency, multi-tenant governance, and "single source of truth" reporting still want a real warehouse. The point of the book isn't never use a warehouse — it's stop reaching for one by reflex. Match the weight of the architecture to the weight of the problem.
That's the same instinct I bring to consulting work: the best data infrastructure is the least infrastructure that actually solves the problem.
The book goes much deeper — DuckDB, Parquet and Arrow in practice, with runnable code and real datasets. Grab it free here. And if your stack feels heavier than your data, let's talk.
Building something data-heavy? Let's talk.