Beyond RAG: Precision Filtering in a Semantic World

Author:Murphy | View: 21656 | Time: 2025-03-22 19:42:51

Early on we all realized that LLMs only knew what was in their training data. Playing around with them was fun, sure, but they were and still are prone to hallucinations. Using such a product in its "raw" form commercially is to put it nicely – dumb as rocks (the LLM, not you… possibly). To try alleviate both the issues of hallucinations and having knowledge of unseen/private data, two main avenues can be taken. Train a custom LLM on your private data (aka the hard way), or use retrieval augmentation generation (aka the one we all basically took).

RAG is an acronym now widely used in the field of NLP and generative AI. It has evolved and led to many diverse new forms and approaches such as Graphrag, pivoting away from the naive approach most of us first started with. The me from two years ago would just parse raw documents into a simple RAG, and then on retrieval, provide this possible (most likely) junk context to the LLM, hoping that it would be able to make sense of it, and use it to better answer the user's question. Wow, how ignorance is bliss; also, don't judge: we all did this. We all soon realized that "garbage in, garbage out" as our first proof-of-concepts performed… well… not so great. From this, much effort was put in by the open-source community to provide us ways to make a more sensible commercially viable application. These included, for example: reranking, semantic routing, guardrails, better document parsing, realigning the user's question to retrieve more relevant documents, context compression, and the list could go on and on. Also, on top of this, we all 1-upped our classical NLP skills and drafted guidelines for teams curating knowledge so that the parsed documents stored in our databases were now all pretty and legible.

While working on a retrieval system that had about 16 (possible exaggeration) steps, one question kept coming up. Can my stored context really answer this question? Or to put it another way, and the one I prefer. Does this question really belong to the stored context? While the two questions seem similar, the distinction lies with the first being localized (e.g. the 10 retrieved docs) and the other globalized (with respect to the entire subject/topic space of the document database). You can think of them as one being a fine-grained filter while the other is more general. I'm sure you're probably wondering now, but what is the point of all this? "I do cosine similarity thresholding on my retrieved docs, and everything works fine. Why are you trying to complicate things here?" OK, I made up that last thought-sentence, I know that you aren't that mean.

To drive home my over-complication, here is an example. Say that the user asks, "Who was the first man on the moon?" Now, let's forget that the Llm could straight up answer this one and we expect our RAG to provide context for the question… except, all our docs are about products for a fashion brand! Silly example, agreed, but in production many of us have seen that users tend to ask questions all the time that don't align with any of the docs we have. "Yeah, but my pretext tells the LLM to ignore questions that don't fall within a topic category. And the cosine similarity will filter out weak context for these kinds of questions anyways" or "I have catered for this using guardrails or semantic routing." Sure, again, agreed. All these methods work, but all these options either do this too late downstream e.g. the first two examples or aren't completely tailored for this e.g. the last two examples. What we really need is a fast classification method that can rapidly tell you if the question is "yea" or "nay" for the docs to provide context for… even before retrieving them. If you've guessed where this is going, you're part of the classical ML crew

Tags: Graphrag Llm Machine Learning Outlier Detection Retrieval Augmented Gen