AI Fundamentals

What Are Vector Embeddings? Meaning, Similarity, Search, and Cost

Direct answer

A vector embedding is a fixed-length numeric representation produced by a model so that items with related learned features can be compared mathematically. Embeddings are commonly used for semantic search, recommendation, clustering, classification, and retrieval-augmented generation. The vector space is model-specific, so changing embedding models normally requires re-embedding the indexed collection.

Apply this concept - RAG Cost Calculator: Embeddings, Vector DB & LLM Spend →

Summary

An embedding turns an item into coordinates

An embedding model accepts an item such as text and returns an ordered list of numbers. The length of that list is the embedding dimension. A simplified threedimensional vector might look like:

Real embedding vectors often contain many more dimensions. Individual coordinates normally do not have a simple humanreadable label. Meaning is distributed across the representation learned during model training.

The useful property is relational. Items that the model considers similar can be placed near one another under a chosen distance or similarity measure. That makes it possible to search a collection by meaning rather than only by exact word overlap.

An embedding turns an item into coordinates

An embedding model accepts an item such as text and returns an ordered list of numbers. The length of that list is the embedding dimension. A simplified threedimensional vector might look like:

Real embedding vectors often contain many more dimensions. Individual coordinates normally do not have a simple humanreadable label. Meaning is distributed across the representation learned during model training.

The useful property is relational. Items that the model considers similar can be placed near one another under a chosen distance or similarity measure. That makes it possible to search a collection by meaning rather than only by exact word overlap.

Embeddings are modelspecific

An embedding has meaning inside the representation space created by its model. Vectors produced by different models should not be compared as though they share one coordinate system. Changing the model generally requires reembedding the indexed collection.

Model version also matters. Even when two versions expose the same number of dimensions, their vectors may not be compatible. Store the embedding model and version with the index metadata.

Dimension count alone does not establish quality. A higherdimensional vector can require more storage and computation without being better for a particular task. Evaluate the model on the queries, documents, languages, and relevance criteria of the application.

How semantic search uses embeddings

1. Split source documents into retrievable chunks. 2. Generate one embedding for each chunk. 3. Store each vector with its source text and metadata. 4. Generate an embedding for the user's query. 5. Find nearby document vectors. 6. Return the associated chunks. 7. Optionally rerank, filter, or supply them to a language model.

The RAG definition explains how retrieved chunks become context for generation.

Semantic search can match related concepts that use different wording. It can also miss exact identifiers or overemphasize broad topical similarity. Production systems often combine vector search with keyword search, metadata filters, or reranking.

Similarity is a scoring rule

Common comparison functions include cosine similarity, dot product, and Euclidean distance. The correct choice depends on how the embedding model was trained and whether vectors are normalized.

Cosine similarity compares vector direction. Dot product incorporates direction and magnitude unless vectors are normalized. Euclidean distance measures straightline distance in the vector space.

Do not choose a universal score threshold from an unrelated example. Score distributions change with the model, corpus, chunking strategy, language, and query type. Calibrate thresholds using labeled examples from the target workload.

Embeddings are not a database by themselves

The embedding model produces vectors. A vector index or database stores them and performs nearestneighbor search. The broader system also needs source text, metadata, access controls, update logic, deletion handling, and observability.

Approximate nearestneighbor indexes trade some exactness for speed and scale. Their configuration can affect recall, latency, memory, and build time. Retrieval quality therefore depends on more than the embedding model.

A vector database also does not solve authorization automatically. Tenant and permission filters must be applied before unauthorized content can enter a result set or model prompt.

Chunking changes the representation

Embedding an entire document produces one representation for all of its contents. That can be too coarse for long files containing several topics. Splitting documents into chunks provides more precise retrieval, but creates design choices:

Chunk size Overlap Structural boundaries Metadata inheritance Table and code handling Duplicate content Parentchild relationships

Smaller chunks can improve local matching while losing context. Larger chunks preserve context while adding irrelevant text and increasing generation tokens. Test retrieval and answer quality together.

The cost model for embeddings

Initial document embedding Reembedding changed documents Query embeddings Vector storage Index build and maintenance Search operations Replication and backups Metadata storage Reranking after retrieval

embedding API cost = embedded input tokens ÷ 1,000,000 × price per million input tokens

The initial indexing bill is usually based on corpus size. Ongoing cost depends on document churn and query volume. Storage grows with vector count, dimensions, numeric representation, index overhead, metadata, and replicas.

Use the RAG cost calculator to keep indexing, query, storage, reranking, and generation costs visible as separate lines.

A storage approximation

For uncompressed vectors stored with D dimensions and B bytes per component:

For example, float32 components use four bytes each. This formula estimates only raw vector payload. A deployable index needs additional space for graph or partition structures, IDs, metadata, allocator overhead, replicas, and backups.

Some systems support reduced precision, product quantization, binary representations, or dimension reduction. These can lower storage and improve search speed, but may change recall. Benchmark against labeled retrieval tasks before adopting them.

Embeddings are not encrypted meaning

A vector is not the original text, but it should not be treated as harmless or anonymous by default. Embeddings can preserve information about their inputs and can be sensitive when derived from private content.

Minimize what is embedded. Restrict access to vectors and source text. Separate tenants. Honor document deletion and retention rules. Review provider data handling. Avoid logging raw private inputs unnecessarily. Test whether metadata exposes sensitive information.

Security should cover the source documents, embedding service, vector index, backups, and generated answers.

How to evaluate an embedding model

Build a representative set of queries with relevance judgments. Then measure whether the expected items appear near the top. Useful retrieval metrics can include recall at k, precision at k, mean reciprocal rank, and normalized discounted cumulative gain.

Does the retrieved evidence support the correct answer? Are exact identifiers preserved? Does performance work across supported languages? Are access filters correct? What is latency at expected scale? What does each indexed document and query cost?

A public benchmark can narrow candidates, but the application test set should decide the final model.

What this article covers

  • An embedding turns an item into coordinates
  • Embeddings are modelspecific
  • How semantic search uses embeddings
  • Similarity is a scoring rule
  • Embeddings are not a database by themselves

Use it with ByteCosts calculators

After reading the research note, open the related calculator and replace the example assumptions with your own users, requests, tokens, seats, or platform usage.

The goal is to convert the article's cost pattern into a concrete monthly run-rate, per-user margin, or break-even point your team can discuss.

Frequently asked questions

Is an embedding the same as a token?

No. Tokens are discrete IDs produced by tokenization. An embedding is a numeric vector representation. Models can create tokenlevel embeddings internally, while embedding APIs often return one vector for an input sequence.

Can embeddings store the original document?

An embedding is not a lossless copy of the document and cannot normally reproduce it directly. The system still stores source text or a reference so retrieved results can be shown or supplied to a model.

Do I need to reembed data when changing models?

Generally yes. Embeddings from different models occupy different learned spaces and should not be mixed unless the model documentation explicitly establishes compatibility.

Are larger embedding dimensions always better?

No. More dimensions increase representation and storage capacity, but task quality depends on training, data, language, and retrieval design. Evaluate accuracy, latency, and cost together.

Cite this page

What Are Vector Embeddings? Meaning, Similarity, Search, and Cost. ByteCosts. Updated 2026-06-21. https://bytecosts.com/blog/what-are-vector-embeddings/

Sources

Machine-readable