Vector Databases Explained: pgvector, Pinecone and Embeddings

Vector databases power semantic search and RAG. Instead of matching keywords, they find items by *meaning* — represented as high-dimensional vectors. Here's the concept and how to choose one.

Embeddings: Meaning as Numbers

An embedding model turns text (or images) into a vector — a list of numbers where similar meanings sit close together in space. 'Dog' and 'puppy' land near each other; 'dog' and 'invoice' do not.

Similarity Search

To answer a query, you embed it and find the nearest vectors using cosine similarity. Vector databases use Approximate Nearest Neighbor (ANN) indexes (HNSW, IVF) to do this fast across millions of vectors.

pgvector vs Dedicated Stores

pgvector — a Postgres extension. Perfect if your data already lives in Postgres; one database, transactional, no new infra.
Pinecone / Weaviate / Qdrant — purpose-built for vectors at massive scale, with managed hosting and advanced filtering.
Start with pgvector for most apps; graduate to a dedicated store when you hit tens of millions of vectors or need specialized features.

sql

-- pgvector: store and query embeddings in Postgres
CREATE EXTENSION vector;
CREATE TABLE docs (id serial, content text, embedding vector(1536));

-- Nearest neighbors by cosine distance
SELECT content FROM docs ORDER BY embedding <=> $1 LIMIT 5;

Match Dimensions

Your column dimension must match your embedding model's output (e.g. 1536). Mixing models with different dimensions in one index breaks search.

Vector Databases Explained: pgvector, Pinecone and Embeddings

Embeddings: Meaning as Numbers

Similarity Search

pgvector vs Dedicated Stores

Keep Reading

How to Build a RAG Application (Retrieval-Augmented Generation)

Getting Started With the Claude API for Developers

Prompt Engineering Techniques That Actually Work

Ready to implement these ideas?