Why I wrote Local-First Analytics

By Arshad Ansari

I spent fifteen years building data platforms for financial-services companies, and somewhere along the way I noticed something uncomfortable: a huge fraction of the analytics infrastructure I was building — and being asked to build — existed to solve problems that fit on a laptop.

That observation became a book: Local-First Analytics. This is the short version of the argument.

The default stack is heavier than the problem

The reflexive modern data stack looks like this: ingest into a cloud warehouse, model it in the warehouse, query the warehouse from a BI tool, and pay for compute every time someone moves a slider on a dashboard. It's a fine architecture when you genuinely have warehouse-scale data and warehouse-scale concurrency.

But most teams don't. They have a few gigabytes — sometimes a few hundred megabytes — and a handful of analysts. For that workload, the cloud round-trip is pure overhead: latency on every query, a bill that scales with curiosity, and a pile of infrastructure to secure and operate. You are renting a freight train to carry a backpack.

What "local-first" means

Local-first analytics flips the default. Instead of shipping the question to where the data lives, you ship a copy of the data to where the question is asked — the analyst's machine, the application server, even the browser — and compute there.

This is now practical because of a quiet revolution in in-process analytical engines. DuckDB runs a columnar, vectorized query engine inside your process with no server to manage. Parquet gives you a compact, columnar file format that's trivial to move and cache. Between them, a dataset that used to "require" a warehouse now opens instantly in a notebook, a CLI, or a WebAssembly sandbox in a tab.

Why it's often the better engineering choice

  • Speed. No network hop. A query against a local Parquet file returns before a warehouse has finished authenticating your session.
  • Cost. Compute happens on hardware you already pay for. No per-query meter.
  • Simplicity. Fewer moving parts means fewer things to secure, monitor, and wake up for at 3am.
  • Privacy. Data that never leaves the device can't leak from a service you forgot to lock down.

When it's not the answer

Local-first isn't a religion. Genuinely large data, high write concurrency, multi-tenant governance, and "single source of truth" reporting still want a real warehouse. The point of the book isn't never use a warehouse — it's stop reaching for one by reflex. Match the weight of the architecture to the weight of the problem.

That's the same instinct I bring to consulting work: the best data infrastructure is the least infrastructure that actually solves the problem.

The book goes much deeper — DuckDB, Parquet and Arrow in practice, with runnable code and real datasets. Grab it free here. And if your stack feels heavier than your data, let's talk.

Building something data-heavy? Let's talk.