1. Input Document
2692 chars2. Chunking Customizer
Define character boundaries per split
3. Generate Coordinates
Send all 6 chunks to OpenAI's text-embedding-3-small model.
Text Splitting Preview
Here is how the article looks when split into chunks. Click "Generate Vector Arrays" to query OpenAI.
Understanding Vector Search and Embeddings in AI Applications In the era of modern artificial intelligence, large language models (LLMs) have revolutionized how we interact with information. However, these models have a fundamental limitation: they can only reason over the data they were trained on, or the context provided to them in a single prompt. This is where Retrieval-Augmented Generation (RAG) comes in. RAG allows us to fetch relevant documents from an external database and feed them to
the LLM to ground its response in factual, up-to-date data. But how do we find "relevant" documents in milliseconds when dealing with millions of unstructured text documents? The answer lies in Vector Math and Embeddings. An embedding is a process that transforms unstructured data, such as a paragraph of text, an image, or an audio clip, into a list of numbers called a vector. In mathematical terms, a vector represents a coordinate in a high-dimensional space. For instance, OpenAI's
text-embedding-3-small model generates vectors with 1,536 dimensions. Each dimension represents a learned semantic concept or feature of the text. Because semantically similar texts are placed close to each other in this high-dimensional space, we can use vector geometry to perform search queries. Before we can generate embeddings, we must prepare our unstructured text using a process called chunking. Because LLMs have context limits and embedding models perform best on focused paragraphs, we
split long articles into smaller chunks. The choice of chunking strategy—such as fixed-character splitting (e.g., 500 characters), word-based splitting, or token-based splitting—directly affects search quality. If chunks are too small, they lose necessary context. If they are too large, the specific details get averaged out, and the embedding becomes too general, reducing search accuracy. Once we have chunked our document and generated embeddings for each chunk, we can perform semantic search.
When a user enters a query, we transform the query into an embedding vector using the same model. We then calculate the similarity between the query vector and all document vectors in our database. The standard formula used for this is Cosine Similarity. Cosine similarity measures the cosine of the angle between two vectors in a multi-dimensional space, which determines if they are pointing in roughly the same direction. It yields a score between -1 and 1, where 1 means the vectors are
identical in direction, and 0 means they are orthogonal (independent). By ranking chunks by their cosine similarity score, we can retrieve the most relevant pieces of information to construct a prompt for our LLM.