Search your documents without a vector database
By Arshad Ansari
The moment you want to search documents by meaning instead of keywords, someone mentions vector database. Pinecone. Weaviate. Milvus. A new service to operate, scale, and pay for.
For a personal or team-sized knowledge base, you don't need one.
The database you already run—DuckDB or SQLite—can store embeddings and find nearest neighbors in-process. Same machine. No extra service. No operator overhead.
What semantic search actually needs
Semantic search is simpler than the hype suggests. Three steps:
- Convert text to a vector (numbers in N-dimensional space) using an embedding model.
- Convert a query to a vector the same way.
- Find the vectors closest to the query vector. The closest ones are the most similar.
"Closest" means using a distance metric. Cosine similarity is common (works well for normalized vectors). Euclidean distance works too. The algorithm is straightforward: compare numbers, sort, pick the top K.
That's it. No magic.
The build: DuckDB + VSS
DuckDB has a vector similarity search extension called VSS. It lets you store embeddings in a table column and query by distance.
First, enable the extension:
INSTALL vss;
LOAD vss;
Create a table with an embedding column. Embeddings are fixed-size arrays of floats:
CREATE TABLE documents (
id INTEGER PRIMARY KEY,
text VARCHAR,
embedding FLOAT[768]
);
The 768 here is the vector dimension—depends on your embedding model. nomic-embed-text (open, good quality) produces 768-dimensional vectors.
Generate embeddings locally using Ollama:
import ollama
import duckdb
conn = duckdb.connect()
# Text you want to search
docs = [
{"id": 1, "text": "How do I optimize database queries?"},
{"id": 2, "text": "What's the difference between SQL and NoSQL?"},
]
# Generate embeddings with a local model
for doc in docs:
result = ollama.embed(model="nomic-embed-text", input=doc["text"])
embedding = result["embeddings"][0] # Gets the vector
conn.execute(
"INSERT INTO documents (id, text, embedding) VALUES (?, ?, ?)",
(doc["id"], doc["text"], embedding)
)
Query by similarity:
query = "How do I make databases faster?"
query_result = ollama.embed(model="nomic-embed-text", input=query)
query_embedding = query_result["embeddings"][0]
# Find top 5 closest documents
results = conn.execute("""
SELECT id, text, array_distance(embedding, ?::FLOAT[768]) as distance
FROM documents
ORDER BY distance
LIMIT 5
""", [query_embedding]).fetchall()
That's the core: embed, insert, query with array_distance().
The VSS extension supports three distance metrics. array_distance() is Euclidean distance (the default). array_cosine_distance() measures angle between vectors. Both work; cosine is slightly faster, Euclidean slightly more accurate in practice. Experiment with your data.
For speed, add an HNSW index when scans get slow:
CREATE INDEX embedding_index
ON documents
USING HNSW (embedding)
WITH (metric = 'cosine');
HNSW (Hierarchical Navigable Small World) is a graph-based index that makes lookups sublinear instead of full-table scans. The trade-off: HNSW indexes live in memory and need rebuilding when you add large batches of vectors. For millions of vectors or high-concurrency writes, this matters. For a team's document store, it's usually fine.
SQLite alternative
If you're embedding vectors into an edge device or a simple app, SQLite + sqlite-vec does the same thing: store embeddings as BLOB columns and search with distance functions. Smaller footprint, same principle.
Hybrid search: combining keyword and vector
A pure vector search can miss documents if the embedding model doesn't "understand" your terminology. Hybrid search combines keyword full-text search with vector search, then ranks the union.
DuckDB has full-text search built in. A simple hybrid approach:
-- Keyword matches
SELECT id, text FROM documents
WHERE text LIKE '%database%'
-- Vector matches (top 5)
UNION
SELECT id, text FROM documents
ORDER BY array_distance(embedding, ?::FLOAT[768])
LIMIT 5
Deduplicate and re-rank the combined results by score. Hybrid often finds what pure vector search alone misses.
Honest boundaries
This works well up to a point. In-process vector search handles low millions of vectors per machine. Light concurrent reads are fine. If you're writing to the embedding table constantly while querying it, or need hard guarantees on freshness across multiple services, a dedicated vector database starts to earn its complexity.
HNSW indexes use memory. Rebuilding them after big batch inserts takes time. Embedding quality depends entirely on your model—a bad embedding model produces bad search results, and no database fixes that.
Most teams hit diminishing returns (good-enough results) long before they hit the system's ceiling. Start in the database you already run. Move only when measurements show you need to.
If you're building a local-first system, this fits naturally alongside DuckDB for analytics and hybrid architectures. For more on staying minimal, see why I wrote Local-First Analytics.
Questions? We help teams ship smarter data systems. Get in touch.
Building something data-heavy? Let's talk.