Knowlege Base AI

1. Input Document

2692 chars

Understanding Vector Search and Embeddings in AI Applications

In the era of modern artificial intelligence, large language models (LLMs) have revolutionized how we interact with information. However, these models have a fundamental limitation: they can only reason over the data they were trained on, or the context provided to them in a single prompt. This is where Retrieval-Augmented Generation (RAG) comes in. RAG allows us to fetch relevant documents from an external database and feed them to the LLM to ground its response in factual, up-to-date data. But how do we find "relevant" documents in milliseconds when dealing with millions of unstructured text documents?

The answer lies in Vector Math and Embeddings. An embedding is a process that transforms unstructured data, such as a paragraph of text, an image, or an audio clip, into a list of numbers called a vector. In mathematical terms, a vector represents a coordinate in a high-dimensional space. For instance, OpenAI's text-embedding-3-small model generates vectors with 1,536 dimensions. Each dimension represents a learned semantic concept or feature of the text. Because semantically similar texts are placed close to each other in this high-dimensional space, we can use vector geometry to perform search queries.

Before we can generate embeddings, we must prepare our unstructured text using a process called chunking. Because LLMs have context limits and embedding models perform best on focused paragraphs, we split long articles into smaller chunks. The choice of chunking strategy—such as fixed-character splitting (e.g., 500 characters), word-based splitting, or token-based splitting—directly affects search quality. If chunks are too small, they lose necessary context. If they are too large, the specific details get averaged out, and the embedding becomes too general, reducing search accuracy.

Once we have chunked our document and generated embeddings for each chunk, we can perform semantic search. When a user enters a query, we transform the query into an embedding vector using the same model. We then calculate the similarity between the query vector and all document vectors in our database. The standard formula used for this is Cosine Similarity. Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space, which determines if they are pointing in roughly the same direction. It yields a score between -1 and 1, where 1 means the vectors are identical in direction, and 0 means they are orthogonal (independent). By ranking chunks by their cosine similarity score, we can retrieve the most relevant pieces of information to construct a prompt for our LLM.

2. Chunking Customizer

Define character boundaries per split

500 chars

100 Chars (Granular)1000 Chars (Broad context)

Resulting Splits:6 chunks

3. Generate Coordinates

Send all 6 chunks to OpenAI's text-embedding-3-small model.

Text Splitting Preview

Here is how the article looks when split into chunks. Click "Generate Vector Arrays" to query OpenAI.

CHUNK #1498 characters

Understanding Vector Search and Embeddings in AI Applications In the era of modern artificial intelligence, large language models (LLMs) have revolutionized how we interact with information. However, these models have a fundamental limitation: they can only reason over the data they were trained on, or the context provided to them in a single prompt. This is where Retrieval-Augmented Generation (RAG) comes in. RAG allows us to fetch relevant documents from an external database and feed them to

CHUNK #2487 characters

the LLM to ground its response in factual, up-to-date data. But how do we find "relevant" documents in milliseconds when dealing with millions of unstructured text documents? The answer lies in Vector Math and Embeddings. An embedding is a process that transforms unstructured data, such as a paragraph of text, an image, or an audio clip, into a list of numbers called a vector. In mathematical terms, a vector represents a coordinate in a high-dimensional space. For instance, OpenAI's

CHUNK #3497 characters

text-embedding-3-small model generates vectors with 1,536 dimensions. Each dimension represents a learned semantic concept or feature of the text. Because semantically similar texts are placed close to each other in this high-dimensional space, we can use vector geometry to perform search queries. Before we can generate embeddings, we must prepare our unstructured text using a process called chunking. Because LLMs have context limits and embedding models perform best on focused paragraphs, we

CHUNK #4498 characters

split long articles into smaller chunks. The choice of chunking strategy—such as fixed-character splitting (e.g., 500 characters), word-based splitting, or token-based splitting—directly affects search quality. If chunks are too small, they lose necessary context. If they are too large, the specific details get averaged out, and the embedding becomes too general, reducing search accuracy. Once we have chunked our document and generated embeddings for each chunk, we can perform semantic search.

CHUNK #5490 characters

When a user enters a query, we transform the query into an embedding vector using the same model. We then calculate the similarity between the query vector and all document vectors in our database. The standard formula used for this is Cosine Similarity. Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space, which determines if they are pointing in roughly the same direction. It yields a score between -1 and 1, where 1 means the vectors are

CHUNK #6213 characters

identical in direction, and 0 means they are orthogonal (independent). By ranking chunks by their cosine similarity score, we can retrieve the most relevant pieces of information to construct a prompt for our LLM.