Advanced Recursive and Follow-Up Retrieval Techniques For Better RAGs

Author:Murphy | View: 24739 | Time: 2025-03-23 11:52:06

There isn't a better way than query modification to improve LLMs.

In one of my recent posts, I discussed five query translation techniques and how they improve the retrieval process in RAG apps. One technique was Query decomposition.

5 Proven Query Translation Techniques To Boost Your RAG Performance

This fantastic technique creates sub-questions to construct a more detailed answer to our initial query. These sub-questions will then be used in the retrieval process. The final LLM takes each question and answer pair as context to generate a comprehensive answer to our initial query.

This post discusses two other promoting techniques we often combine with query decomposition for better results.

The first technique is recursive answering, which involves generating subquestions in bulk but answering them recursively. The second technique is followup questioning. As you might have guessed, we answer the question and generate followup questions at every stage. I've experienced that the second one gives more complete answers and can drill down on a broad user's question.

Let's start with the basic query decomposition workflow and build on it.

How does Query decomposition work in RAG?

Let's expand our knowledge of query decomposition.

For instance, if you ask an LLM, "Is Clickjacking different from SQL injection?" the LLM might give a generic answer. Then, you decide to retrieve information from a reliable source, like Django's documentation page.

However, during the retrieval, rather than directly searching for documents with "Can SSL certification prevent SQL injection attacks?", it makes sense to split the question and search separately.

Why Does Position-Based Chunking Lead to Poor Performance in RAGs?

In this case, our example query can be split into "What is SSL certification?" and "What is SQL injection?" We perform RAG for these individual questions and generate answers. We then pass these question-and-answer pairs to the final LLM to answer the user's initial question.

The diagram below summarizes the process we just discussed.

Query Decomposition in RAG – image by the author.

This approach ensures that the LLM has all the required details to answer the question on a very granular level. The result is a richer answer than what you get from a basic RAG.

Here's a code implementation of query decomposition using long-chain and OpenAI models.

import os
from dotenv import load_dotenv

load_dotenv()
pyt

import bs4
from operator import itemgetter
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

# 1. Index the documents
# 1.1 Load the documents
web_loader = WebBaseLoader(
    web_paths=("https://docs.djangoproject.com/en/5.1/topics/security/",),
    bs_kwargs=dict(parse_only=bs4.SoupStrainer(id=("docs-content"))),
)
docs = web_loader.load()

# 1.2 Split the documents
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=1000, chunk_overlap=100
)
splits = text_splitter.split_documents(docs)

# 1.3 Index the documents
vector_store = Chroma.from_documents(documents=docs, embedding=OpenAIEmbeddings())
retriever = vector_store.as_retriever()

# 2. Helper functions
def output_parser(x):
    "This helper function parses the LLM output, prints it, and returns it."
    parsed_output = StrOutputParser().invoke(x)
    print("n" + parsed_output + "n")

    return parsed_output

def qa_constructor(questions: list[str]) -> str:
    qa_pairs = []
    for q in questions:
        r = chain.invoke(q)
        qa_pairs.append((q, r))

    qa_pairs_str = "n".join([f"Q: {q}nA: {a}" for q, a in qa_pairs]).strip()

    print("n" + "Generated QA pairs:")
    print(qa_pairs_str)

    return qa_pairs_str

# 3. Create a basic RAG chain
# 3.1 Define the prompt template
rag_template = """
    Answer the question in the following context:
    {context}

    Question: {question}
    """

prompt_template = ChatPromptTemplate.from_template(rag_template)

# 3.2 Define the model
llm = ChatOpenAI(temperature=0.5)

# 3.3 Define the chain
chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt_template
    | llm
    | output_parser
)

# 4. Create a decomposition chain
decomposition_template = """
Break the following user question into smaller, more specific questions.
Provide these subquestions separated by newlines. 
Do not rephrase if you see unknown terms.
Question: {question}
subquestions:
"""

decomposition_prompt_template = ChatPromptTemplate.from_template(decomposition_template)

decomposition_chain = (
    {"question": RunnablePassthrough()}
    | decomposition_prompt_template
    | llm
    | output_parser
    | (lambda x: x.split("n"))
    | qa_constructor
)

# 5. Create the final RAG chain
decompsition_rag_template = """
    Answer the question in the following context:
    {context}

    Here are some background questions and answers that will help you answer the question:
    {qa_pairs}

    Question: {question}
    """

decomposition_prompt_template = ChatPromptTemplate.from_template(
    decompsition_rag_template
)

decomposition_rag_chain = (
    {
        "context": itemgetter("question") | retriever,
        "qa_pairs": itemgetter("question") | decomposition_chain,
        "question": RunnablePassthrough(),
    }
    | decomposition_prompt_template
    | llm
    | output_parser
)

# 6. Invoke the chain
decomposition_rag_chain.invoke(
    {"question": "Can SSL certification prevent SQL injection attacks?"}
)

When I run the above code, here are the subquestions and answers the LLM has generated:

Q: - What is SSL certification?
A: SSL certification refers to the process of obtaining a Secure Sockets Layer (SSL) certificate for a website. This certificate is used to encrypt data transmitted between a user's browser and the website, ensuring that sensitive information such as login credentials, payment details, and personal information is secure and protected from unauthorized access. SSL certification is essential for establishing a secure connection and building trust with website visitors.
Q: - How does SSL certification work in terms of security?
A: SSL certification works by encrypting the data exchanged between a client and a server using secure sockets layer (SSL) or its successor, transport layer security (TLS). When a user connects to a website with an SSL certificate, their browser and the server establish a secure connection by exchanging encryption keys. This ensures that any data transmitted between the user and the website is encrypted and secure from eavesdroppers or malicious actors. Additionally, SSL certificates also verify the identity of the website, providing assurance to users that they are interacting with the legitimate website and not a fraudulent one. This helps prevent man-in-the-middle attacks and protects sensitive information such as login credentials, payment details, and personal data.
Q: - What is a SQL injection attack?
A: Answer: - A SQL injection attack is a type of attack where a malicious user is able to execute arbitrary SQL code on a database. This can result in records being deleted or data leakage.
Q: - How does a SQL injection attack work?
A: SQL injection is a type of attack where a malicious user is able to execute arbitrary SQL code on a database. This can result in records being deleted or data leakage. In the context of Django, querysets are protected from SQL injection since their queries are constructed using query parameterization. This means that a query's SQL code is defined separately from the query's parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver. However, developers should exercise caution when writing raw queries or executing custom SQL in Django, as these capabilities should be used sparingly and parameters should always be properly escaped to prevent SQL injection attacks.
Q: - Can SSL certification specifically prevent SQL injection attacks?
A: No, SSL certification cannot specifically prevent SQL injection attacks. SSL certification encrypts the data transmitted between the client and server, providing secure communication. However, SQL injection attacks occur when malicious SQL code is injected into a database query, taking advantage of vulnerabilities in the application's input validation. To prevent SQL injection attacks, developers should use parameterized queries, escape user input, and validate input data properly. Django's querysets are protected from SQL injection through query parameterization, which is a best practice for preventing such attacks.

And here's our final answer:

"No, SSL certification cannot prevent SQL injection attacks. SSL certification encrypts the data transmitted between the client and server, providing secure communication. However, SQL injection attacks occur when malicious SQL code is injected into a database query, taking advantage of vulnerabilities in the application's input validation. To prevent SQL injection attacks, developers should use parameterized queries, escape user input, and validate input data properly. Django's querysets are protected from SQL injection through query parameterization, which is a best practice for preventing such attacks."

This answer is complete and very helpful. This is why we often use query decomposition in RAG designs.

But we could make it even better with a few tweaks.

3 Ways to Deploy Machine Learning Models in Production

Recursively answering Q&A pairs.

In the recursive answering technique, we feed the question and retrieve the answer of every subquestion to its subsequent subquestion. The LLM has access to what was said before for related sub-questions. Thus, answers to the subquestion would be consistent and not repeat themselves.

The diagram above illustrates the process more clearly. As we can see, there are three points of contact with our LLM through this technique. Firstly, we talk to our LLM to generate sub-questions. Second, we talk to LLM to generate answers using the retrieved content and previously asked questions and answers. Lastly, our answering LLM uses all the question and answer pairs we've generated to produce the final answer to the user's original query.

Here's the code implementation of the recursive querying technique. Notice that the code is also divided into three subsections.

# 1. Query decomposition
# ---------------------------------------------------
# Decomposition chain
decomposition_prompt = """
Break the following user question into smaller, more specific questions.
Provide these subquestions separated by newlines. 
Do not rephrase if you see unknown terms.
Question: {question}
subquestions:
"""

decomposition_prompt_template = ChatPromptTemplate.from_template(decomposition_prompt)

# Answer chain
decompositon_chain = (
    decomposition_prompt_template | llm | output_parser | (lambda x: x.split("n"))
)

questions = decompositon_chain.invoke(
    "Can SSL certification prevent SQL injection attacks?"
)

# 2. Retrieval and Recursive answering
# ---------------------------------------------------
recursive_answering_prompt = """
You need to answer the questions below in the following context.
Question: {question}
Context: {context}

Here are any prior questions and your answers:
{qa_pairs}

Answer:
"""

recursive_answering_prompt_template = ChatPromptTemplate.from_template(
    recursive_answering_prompt
)

recursive_answering_prompt_chain = (
    {
        "question": itemgetter(
            "question",
        ),
        "context": itemgetter(
            "context",
        ),
        "qa_pairs": itemgetter(
            "qa_pairs",
        ),
    }
    | recursive_answering_prompt_template
    | llm
    | output_parser
)

def recursively_answer_questions(questions: list[str]) -> str:

    qa_pairs = ""
    for question in questions:
        docs = retriever.invoke(question)
        context = " ".join([doc.page_content for doc in docs])

        answer = recursive_answering_prompt_chain.invoke(
            {
                "question": question,
                "context": context,
                "qa_pairs": qa_pairs,
            }
        )

        qa_pairs += f"Q: {question}nA: {answer}n"

    return qa_pairs

# 3. Final answering
# ---------------------------------------------------
final_prompt = """
Provide a comprehensive answer to the following question based on the subquestions you answered.
Question: {question}

Here are the subquestions and answers you provided:
{qa_pairs}

Answer: 
"""

final_prompt_template = ChatPromptTemplate.from_template(final_prompt)

final_prompt_chain = (
    {
        "question": itemgetter(
            "question",
        ),
        "qa_pairs": itemgetter(
            "qa_pairs",
        ),
    }
    | final_prompt_template
    | llm
    | output_parser
)

final_answer = final_prompt_chain.invoke(
    {
        "question": "Can SSL certification prevent SQL injection attacks?",
        "qa_pairs": qa_pairs,
    }
)

print(final_answer)

Upon running the above code, we get the following outputs: Here are the questions and answers the LLM has generated.

Q: - What is SSL certification?
A: SSL certification stands for Secure Sockets Layer certification. It is a standard security technology that establishes an encrypted link between a web server and a browser. This link ensures that all data passed between the server and browser remains private and secure. SSL certification is essential for protecting sensitive information such as login credentials, payment details, and personal information exchanged on a website.
Q: - How does SSL certification work?
A: SSL certification works by establishing an encrypted link between a web server and a browser. This link ensures that all data passed between the server and browser remains private and secure. SSL certification is essential for protecting sensitive information such as login credentials, payment details, and personal information exchanged on a website.
Q: - What is a SQL injection attack?
A: A SQL injection attack is a type of attack where a malicious user is able to execute arbitrary SQL code on a database. This can result in records being deleted or data leakage. Django's querysets are protected from SQL injection since their queries are constructed using query parameterization. A query's SQL code is defined separately from the query's parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver. Django also gives developers power to write raw queries or execute custom SQL. These capabilities should be used sparingly, and you should always be careful to properly escape any parameters that the user can control. In addition, you should exercise caution when using `extra()` and `RawSQL`.
Q: - How do SQL injection attacks work?
A: SQL injection attacks work by allowing a malicious user to execute arbitrary SQL code on a database. This can result in records being deleted or data leakage. Django protects against SQL injection by constructing queries using query parameterization, where the SQL code is separate from the user-provided parameters. The parameters are escaped by the underlying database driver to prevent any unauthorized SQL execution. Developers should be cautious when using raw queries or custom SQL and ensure proper escaping of user-controlled parameters to prevent SQL injection vulnerabilities.
Q: - Can SSL certification effectively prevent SQL injection attacks?
A: SSL certification is a crucial component in securing a website by establishing an encrypted link between a web server and a browser. This encryption ensures that data exchanged between the server and browser remains private and secure. While SSL certification helps protect sensitive information like login credentials and payment details, it does not directly prevent SQL injection attacks. SQL injection attacks involve malicious users executing arbitrary SQL code on a database, potentially leading to data loss or leakage. Django, however, offers protection against SQL injection through query parameterization, where SQL code is separated from user-provided parameters and escaped by the underlying database driver. Developers should exercise caution when writing raw queries or custom SQL to prevent SQL injection vulnerabilities.
Q: - Are there other measures that can be taken to prevent SQL injection attacks?
A: Yes, there are additional measures that can be taken to prevent SQL injection attacks. One important measure is to always use parameterized queries when interacting with the database, as this helps prevent SQL injection by separating the SQL code from user-provided parameters. Additionally, developers should avoid using raw SQL queries and instead utilize Django's querysets to interact with the database safely. It is also crucial to properly escape any user-controlled input to prevent any unauthorized SQL execution. Regularly updating Django and database software to the latest versions can also help mitigate potential vulnerabilities. Conducting regular security audits and penetration testing can further enhance the security posture of the application and help identify any potential weaknesses that could be exploited by attackers.

And here's the final answer our complete chain has produced.

SSL certification alone cannot effectively prevent SQL injection attacks. While SSL certification is essential for securing the transmission of data between a web server and a browser, it does not directly address the vulnerability that SQL injection attacks exploit. SQL injection attacks involve malicious users executing arbitrary SQL code on a database, potentially leading to data loss or leakage. To prevent SQL injection attacks, developers should implement additional measures such as using parameterized queries, avoiding raw SQL queries, and properly escaping user-controlled input. Django provides protection against SQL injection through query parameterization, where SQL code is separated from user-provided parameters and escaped by the underlying database driver. Regularly updating software, conducting security audits, and penetration testing can further enhance the security posture of the application and help identify and address potential vulnerabilities that could be exploited by attackers.

As you can see, the answer generated with this method is more comprehensive. The answer also suggests how to prevent security threats in the future with software updates, security audits, etc.

5 Python Decorators I Use in Almost All My Data Science Projects

Generating followup questions and answers

We generated and answered questions sequentially using prior question and answer pairs as the context in the previous method. This method has improved the output with more helpful content.

The other helpful technique is to generate questions sequentially. Sometimes, if the domain is cutting-edge and the LLM has no clue what has been asked, it won't be able to decompose a user question reasonably. We can help the LLM ask more meaningful followup questions by passing prior question-and-answer pairs.

Here's a diagrammatic representation of generating followup questions.

Let's go ahead and implement this using Langchain.

# 1. Chain for answering the question and generate followup quesion
answer_and_followup_prompt = """
You need to answer the question below in the following context and generate a followup question.
Context: {context}
Question: {question}

Also, consider any prior questions and answers you've generated:
Prior questions and answers: {prior_qa}

Provide output as a dictionary with the following keys:
Answer,
Followup
"""

answer_and_followup_prompt_template = ChatPromptTemplate.from_template(
    answer_and_followup_prompt
)

answer_and_followup_chain = (
    {
        "context": itemgetter("context"),
        "question": itemgetter("question"),
        "prior_qa": itemgetter("prior_qa"),
    }
    | answer_and_followup_prompt_template
    | llm
    | output_parser
    | loads
)

# 2. Recursive function to dive deep into the domain
def recursively_ask(question, prior_qa="", n=3):
    context = retriever.invoke(question)

    response = answer_and_followup_chain.invoke(
        {
            "context": context,
            "question": question,
            "prior_qa": prior_qa,
        }
    )

    answer, followup = response["Answer"], response["Followup"]

    prior_qa += f"Q:{question}nA:{answer}nn"
    n -= 1

    if n == 0:
        return prior_qa
    else:
        return recursively_ask(followup, prior_qa, n)

question = "Can SSL certification prevent SQL injection attacks?"
question_and_answers = recursively_ask(question, n=3)

# 3. Final RAG execution
final_prompt = """
Provide a comprehensive answer to the following question based on the subquestions you answered.
Question: {question}

Here are the subquestions and answers you provided:
{qa_pairs}

Answer: 
"""

final_prompt_template = ChatPromptTemplate.from_template(final_prompt)

final_prompt_chain = (
    {
        "question": itemgetter(
            "question",
        ),
        "qa_pairs": itemgetter(
            "qa_pairs",
        ),
    }
    | final_prompt_template
    | llm
    | output_parser
)

final_answer = final_prompt_chain.invoke(
    {
        "question": question,
        "qa_pairs": question_and_answers,
    }
)

The above script again has three subsections. The first part answers the question and generates a followup question. The structured output is a dictionary, and we use the lang chain's loads module to parse it.

The second part creates a recursive function. This function calls the chain we created in the first part with the question we provide at run time. It uses the followup question in the response to call itself once again. This process continues until a user sets the parameter n reaches zero. Finally, it outputs all the questions, followup questions, and the answers it has generated.

Here are the followup questions we've generated and their answers:

Q:Can SSL certification prevent SQL injection attacks?
A:SSL certification cannot prevent SQL injection attacks. SSL primarily encrypts data transferred between the client and server to prevent eavesdropping and tampering, but it does not protect against SQL injection attacks.

Q:What are some effective measures to prevent SQL injection attacks in a Django application?
A:Django's querysets are protected from SQL injection since their queries are constructed using query parameterization. A query's SQL code is defined separately from the query's parameters. Since parameters may be user-provided and therefore unsafe, they are escaped by the underlying database driver. Developers should also exercise caution when using extra() and RawSQL.

Q:What are some best practices for handling user input to further mitigate the risk of SQL injection attacks in a Django application?
A:To further mitigate the risk of SQL injection attacks in a Django application, it is recommended to use query parameterization for constructing queries, escape user-provided parameters, avoid writing raw queries or executing custom SQL unless necessary, and exercise caution when using extra() and RawSQL.

The third part may look very familiar to you. It's, for the most part, intact. We pass the generated QA pairs to answer our initial question. Here's our final output:

SSL certification alone cannot prevent SQL injection attacks. SSL primarily encrypts data transferred between the client and server to prevent eavesdropping and tampering, but it does not protect against SQL injection attacks. To prevent SQL injection attacks in a Django application, developers should utilize Django's querysets, which are protected from SQL injection due to query parameterization. Query parameterization ensures that a query's SQL code is defined separately from the query's parameters, which are escaped by the underlying database driver. Additionally, best practices for handling user input in a Django application include using query parameterization for constructing queries, escaping user-provided parameters, avoiding writing raw queries or executing custom SQL unless necessary, and exercising caution when using functions like extra() and RawSQL. By following these best practices, developers can significantly mitigate the risk of SQL injection attacks in their Django applications.

This output is even more comprehensive. Note that the extra() and RawSQL mentions were not in the previous ones.

The domain I've used in the example is pretty popular, and chances are high that the LLM already knows the answer without an RAG workflow. But imagine you're using this technique in your private data. Rather than reading everything and then figuring out what to ask the LLM, this workflow automatically figures out what to ask next.

This is How I Create Dazzling Dashboards Purely in Python.

Final thoughts

LLMs are powerful, and RAG makes them more powerful. There are a thousand different ways you can improve the LLM. However, one of the most fundamental gains in RAGs comes from how you retrieve data from your sources.

Query decomposition is one of my favorites, as it could bring new perspectives to our final answer.

In this post, I've shown two techniques to further improve decomposition. The first is good for general answering, and the second has excellent drill-down abilities and provides more comprehensive answers.

I'd like to know your favorite technique or how you would modify and improve it. Do let me know in the comments.

Thanks for reading, friend! Say Hi to me on LinkedIn, Twitter, and Medium.

Tags: Artificial Intelligence Data Science Hands On Tutorials Machine Learning Retrieval Augmented