Modular RAG & Semantic Context Injection
To eliminate hallucinations, we bypassed the LLM’s parametric memory in favor of a Modular Retrieval-Augmented Generation (RAG) pipeline. We implemented a hybrid search strategy combining Dense Vector Embeddings (utilizing OpenAI’s text-embedding-3-small) with Sparse Keyword Search (BM25).
- • HNSW Indexing: Hosted on Pinecone for sub-100ms vector lookups across 10M+ document chunks.
- • Recursive Character Splitting: Sophisticated chunking logic that preserves semantic integrity across tables and nested lists.
- • Cross-Encoder Re-ranking: A secondary scoring layer to ensure the most relevant context window saturation.