Vector Databases Comparison: Pinecone, Weaviate, Milvus, and pgvector — Honest Benchmarks
Every vector database claims to be the fastest, most scalable, and easiest to use. None of them are all three. This vector databases comparison gives you representative benchmarks, real cost numbers, and practical guidance grounded in how these databases behave in production AI applications — not toy demos with 10,000 vectors. The goal is to help you pick the smallest tool that solves your actual problem, then know exactly when you have outgrown it.
Why You Need a Vector Database (And When You Don’t)
Vector databases store and search high-dimensional embeddings — the numerical representations that AI models use to understand text, images, and code. When a user asks “show me articles about machine learning,” the database finds documents whose embeddings are mathematically close to the query embedding, regardless of the exact words used. The search is approximate by design: instead of scanning every vector, an index narrows the candidates so latency stays flat as the dataset grows.
You need one if you’re building RAG (Retrieval-Augmented Generation) applications, semantic search, recommendation systems, image similarity search, or anomaly detection. Moreover, any application that must find “similar” items based on meaning rather than exact keyword matching benefits from vector search.
When you DON’T need one: If you have fewer than 100K vectors and don’t need sub-10ms latency, pgvector on your existing PostgreSQL is sufficient. Don’t add infrastructure complexity for a problem a database extension solves. Additionally, if your search is keyword-based — exact matching and filtering — a traditional full-text engine like Elasticsearch is the better fit.
How ANN Indexes Actually Work
The performance differences between these databases come down almost entirely to their indexing algorithms, so a brief mental model pays off. The dominant index today is HNSW (Hierarchical Navigable Small World), a layered graph where each node links to a handful of neighbors. A query enters at the top sparse layer, greedily hops toward closer vectors, then descends into denser layers — turning a brute-force scan into a short walk.
HNSW exposes two knobs that govern the speed-versus-accuracy trade-off. The build-time ef_construction and M parameters control how richly connected the graph is, while the query-time ef_search controls how many candidates are explored before returning. Raising ef_search improves recall at the cost of latency, which is the single lever you tune most often in production.
The alternative family, IVF (inverted file) with product quantization, clusters vectors and compresses them. IVF_PQ uses far less memory and is faster, but the compression is lossy — which is exactly why the benchmark below shows Milvus IVF_PQ at 95.8% recall versus 99% for HNSW. Choose IVF_PQ when memory or cost dominates and a few points of recall are acceptable; choose HNSW when accuracy matters most.
Vector Databases Comparison: Real Benchmarks
TEST SETUP:
Vectors: 1 million, 1536 dimensions (OpenAI ada-002 embeddings)
Hardware: 8 vCPU, 32GB RAM (for self-hosted)
Query: Top-10 nearest neighbors, no metadata filtering
LATENCY (p99, milliseconds):
Pinecone (Serverless): 8ms
Weaviate (HNSW): 12ms
Milvus (IVF_PQ): 11ms
Milvus (HNSW): 9ms
pgvector (HNSW): 22ms
pgvector (IVF): 35ms
RECALL@10 (accuracy — how often the true top 10 are found):
Pinecone: 99.2%
Weaviate (HNSW): 98.5%
Milvus (HNSW): 99.0%
Milvus (IVF_PQ): 95.8% (lossy compression trades accuracy for speed)
pgvector (HNSW): 98.0%
pgvector (IVF): 93.5%
QUERIES PER SECOND (single node):
Pinecone: ~2000 (managed, auto-scales)
Weaviate: ~1500
Milvus: ~3000 (with GPU: ~15000)
pgvector: ~500
AT 10 MILLION VECTORS:
Pinecone: 12ms p99 (auto-scales, no config change)
Weaviate: 25ms p99 (needs memory tuning)
Milvus: 15ms p99 (needs cluster scaling)
pgvector: 85ms p99 (struggles — consider upgrading)
These figures are representative of independent ANN benchmarks rather than a personal lab run; your numbers will shift with hardware, embedding dimensionality, and index parameters. The pattern, however, holds consistently: below a million vectors every option is fast enough, and the gaps only widen as you scale. That is the most important takeaway from any honest benchmark — do not pay for scale you don’t have.
The Contenders: Architecture Overview
Pinecone is fully managed — you don’t deploy or configure anything. Send your vectors to their API, query their API, done. The trade-off is vendor lock-in and pricing that scales linearly with usage.
Weaviate is open-source with optional managed hosting. It has built-in vectorization modules (connect an embedding model directly), a GraphQL API, and hybrid search that fuses vector similarity with keyword (BM25) scoring. The trade-off is operational complexity if you self-host.
Milvus is open-source and designed for massive scale (billions of vectors). It separates storage and compute, supports GPU-accelerated search, and offers the most index-type options. The trade-off is significant operational complexity — it runs on Kubernetes with multiple cooperating services.
pgvector is a PostgreSQL extension. Install it, add a vector column, create an index, done. It runs on your existing instance, so transactional data and embeddings live in one place. The trade-off is performance at scale — it can’t match purpose-built databases much above ~5M vectors.
Metadata Filtering: The Detail That Decides Real Apps
Benchmarks with “no metadata filtering” flatter every database, yet real queries almost always filter — by tenant, category, price, or recency. How a database combines the filter with the vector search dramatically changes both correctness and speed, and this is where naive setups quietly return wrong results.
The failure mode is pre- versus post-filtering. With post-filtering, the engine fetches the top-K nearest neighbors first and then discards those that fail the filter — so a tight filter can leave you with two results when you asked for ten. Pinecone, Weaviate, and Milvus implement filtered search that applies predicates during the graph traversal, preserving result counts. With pgvector you control it directly in SQL, which is both its strength and a trap if you forget to size the candidate set.
-- pgvector: filter applied alongside the ANN scan
-- Index the metadata column too, or the planner may ignore the HNSW index
CREATE INDEX ON products USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON products (category, price);
SELECT id, name, price,
1 - (embedding <=> $1::vector) AS similarity
FROM products
WHERE category = 'electronics' AND price < 1000
ORDER BY embedding <=> $1::vector
LIMIT 10;
When a selective filter and a vector search collide, test recall explicitly. A query that “works” on 1,000 rows can silently degrade on 10 million if the planner chooses a sequential scan over the index, so verify the execution plan with EXPLAIN ANALYZE before trusting it in production.
Cost Comparison (Monthly)
1 MILLION VECTORS, 1536 DIMENSIONS:
Pinecone Serverless:
Storage: $0.33/GB x ~6GB = ~$2/month
Reads: 10M queries x $8/1M = $80/month
Writes: 1M upserts x $2/1M = $2/month
TOTAL: ~$84/month
Weaviate Cloud:
Sandbox (free tier): up to 1M vectors, limited throughput
Standard: ~$95/month (managed, SLA)
Self-hosted: Infrastructure cost only (~$50-100/month on cloud VMs)
Milvus (Zilliz Cloud):
Standard: ~$65/month for 1M vectors
Self-hosted: Infrastructure cost (~$80-150/month on K8s)
pgvector:
$0/month additional — runs on your existing PostgreSQL
(Assuming you already have a PostgreSQL instance)
AT 100 MILLION VECTORS:
Pinecone: ~$800/month
Weaviate Cloud: ~$500/month
Milvus/Zilliz: ~$400/month
pgvector: Not recommended at this scale
The headline price rarely tells the whole story. Self-hosted “free” software still costs engineer time for upgrades, backups, and on-call, which often exceeds the managed fee at small scale. Conversely, managed query-based pricing like Pinecone’s can surprise a chatty RAG app that fires many retrievals per user turn, so model your actual query volume — not just storage — before committing.
Code: Same Task in Each Database
# PINECONE — Simplest API
import pinecone
pc = pinecone.Pinecone(api_key="your-key")
index = pc.Index("products")
# Upsert
index.upsert(vectors=[{
"id": "prod-123",
"values": embedding, # 1536-dim list
"metadata": {"category": "electronics", "price": 599.99}
}])
# Query with metadata filter
results = index.query(
vector=query_embedding,
top_k=10,
filter={"category": {"$eq": "electronics"}, "price": {"$lt": 1000}}
)
# WEAVIATE — built-in hybrid search (vector + keyword)
import weaviate
client = weaviate.connect_to_local()
products = client.collections.get("Product")
response = products.query.hybrid(
query="wireless noise cancelling headphones",
alpha=0.5, # 0 = pure keyword, 1 = pure vector
limit=10,
)
The Decision — Simplified
pgvector if: You already use PostgreSQL, have under 5M vectors, and don’t want new infrastructure. It’s free, it’s familiar, and it’s good enough for most applications.
Pinecone if: You want zero operational overhead, rapid prototyping, and are OK with managed pricing. Ideal for startups and teams without dedicated infrastructure engineers.
Weaviate if: You want open-source with built-in vectorization (no separate embedding API calls), hybrid search, and multi-tenancy. Good for self-hosted production deployments.
Milvus if: You have massive scale (100M+ vectors), need GPU acceleration, or require the most flexible indexing options. Designed for enterprise-scale AI infrastructure.
Migration and Avoiding Lock-In
A pragmatic strategy is to start on pgvector and graduate later, but graduating is easier if you plan for it now. The portable parts of your system are the embeddings themselves and the chunking pipeline that produced them; the lock-in lives in proprietary query APIs and index configuration. Therefore, wrap your retrieval calls behind a thin interface so swapping the backend touches one module instead of your whole codebase.
When you do migrate, re-embed only if you change the embedding model — otherwise you can bulk-export the existing vectors and metadata and re-ingest them, then run a recall comparison on a held-out query set to confirm the new store matches the old. Above all, keep the raw source documents, because they are the one asset you can always re-embed from if a vendor or model changes.
Related Reading:
- RAG Architecture Patterns for Production
- RAG vs Fine-Tuning Decision Guide
- PostgreSQL 17 Features Guide
Resources:
In conclusion, the vector databases comparison shows that the right choice depends on your scale, operational capacity, and existing infrastructure. Don’t over-engineer: start with pgvector on your existing PostgreSQL, verify recall under realistic filters, and only move to a dedicated vector database when performance requirements genuinely demand it.