Ask your database in plain English — locally
By Arshad Ansari
You have a database with data you want to ask questions about. Normally, you'd either write the SQL yourself or hand the question to a data analyst. What if your laptop could just do it—turning "how many users signed up last week" into working SQL, without sending your data anywhere?
In 2026, this is genuinely usable. A small language model (7 billion parameters) fine-tuned on SQL, running locally via Ollama, can do the job. You don't need a vector database indexed with your schema, a fine-tuned model, or an API key. Just paste your table definitions into the prompt and ask.
Why a small SQL model beats a big general one
People often reach for GPT-4 or another heavyweight for text-to-SQL. It works, but it's overkill. A 70 billion parameter model is doing general reasoning when you just need SQL generation. You're paying per token, bandwidth, and latency.
A 7B model like duckdb-nsql is trained specifically on DuckDB text-to-SQL pairs. It knows SQL better than a general model because it's seen thousands of examples of English questions paired with their SQL equivalents. It's smaller, faster (runs in seconds on a laptop), costs nothing per query, and your data never leaves your machine.
General models also tend to over-engineer the solution—adding CTEs, subqueries, and window functions when a simple query works. Smaller, specialized models tend to be more direct.
The trick: schema in the prompt
You might think you need a vector database, embedding your table schemas so the model can retrieve the relevant ones. For a handful of tables, that's premature optimization.
Instead, just put the CREATE TABLE statements directly into the prompt. If you have ten tables, that's maybe 500 tokens—nothing. The model reads them, understands the structure, and generates SQL that fits.
This is your entire retrieval strategy. It's boring, but it works.
A working example
Here's what this looks like in practice. You'll need Ollama running locally and the duckdb-nsql model pulled.
ollama pull duckdb-nsql
Then, in Python:
import duckdb
from ollama import chat
# Open or create your database
conn = duckdb.connect("analytics.db")
# Get your schema: duckdb_tables() exposes each table's CREATE statement
# in its `sql` column. User tables live in the `main` schema.
schema_query = """
SELECT sql FROM duckdb_tables()
WHERE schema_name = 'main'
ORDER BY table_name
"""
schema_rows = conn.execute(schema_query).fetchall()
schema_text = "\n".join([row[0] for row in schema_rows if row[0]])
# Build the prompt
system_prompt = f"""You are a SQL expert. Generate correct DuckDB SQL for the user's question.
Return ONLY the SQL query, no explanation.
Schema:
{schema_text}
"""
# Ask the user's question
user_question = "How many users signed up in the last 7 days?"
response = chat(
model='duckdb-nsql',
messages=[
{'role': 'system', 'content': system_prompt},
{'role': 'user', 'content': user_question}
]
)
generated_sql = response.message.content.strip()
print("Generated SQL:")
print(generated_sql)
# Run it (read-only)
try:
result = conn.execute(generated_sql).fetchall()
print(result)
except Exception as e:
print(f"Error: {e}")
finally:
conn.close()
Real setup required: pip install ollama duckdb and ollama serve running in another terminal.
The model reads your schema, understands your table names and columns, and generates working SQL. You review it (always, always review generated SQL), run it read-only, and get your answer.
Guardrails that matter
Never auto-execute generated SQL against a writable database. Always:
- Review the SQL before running it. The model can invent columns that don't exist or misunderstand your intent.
- Run read-only. Use a connection that has no INSERT, UPDATE, or DELETE permissions. If the model generates a write query by accident, it fails safely.
- Add a LIMIT. Append
LIMIT 1000if the query doesn't have one, so you don't accidentally pull the whole table into memory. - Check the result. Does it make sense? Does the count or date range feel right?
Think of this as a drafting aid for your SQL, not an autonomous analyst.
Honest boundaries
This works well for straightforward questions on a small, clean schema with well-named tables and columns. It falls apart when:
- The schema is huge. Fifty tables with vague names (table_a, table_b_final_v2) and no comments confuse the model.
- The question is fuzzy. "How are we doing?" or "Show me the trends" requires business logic and judgment. SQL alone won't cut it.
- The model invents columns. It will hallucinate a
last_purchase_datecolumn if your table haspurchase_atandlast_order_timestamp. You catch this in the review step, but it's a reminder this isn't magic. - You need joins across five tables with complex logic. The model can do it, but the SQL gets verbose and fragile. At that point, writing it yourself (or pair-programming with the model) is faster.
This is not a replacement for analytics engineering. It's a shortcut for the 80% of questions that fit a simple pattern.
Why this matters
Most teams don't try local text-to-SQL because they haven't. They jump to hosted platforms or assume it won't work. The reality: a 7B model, free, on your laptop, handles the job for most daily questions.
You don't need vector retrieval, Pinecone, or a fine-tuned model. You don't need an API key or a bill. You need a small model, your schema, and a few lines of Python.
Start with the smallest thing that works. Upgrade later if you hit a real wall.
If you're building data infrastructure and want to keep things simple, check out Local-First Analytics in Practice and DuckDB in Production. And if you're exploring what a local LLM can do in your data pipeline, read Calling an LLM from Inside SQL.
For consulting on data platforms that stay simple and don't break, reach out at /contact-us.
Building something data-heavy? Let's talk.