Developing an Autonomous Dual-Chatbot System for Research Paper Digesting

Author:Murphy | View: 28291 | Time: 2025-03-23 13:03:41

As a researcher, reading and understanding scientific papers has always been a crucial part of my daily routine. I still remember the tricks I learned in grad school for how to digest a paper efficiently. However, with countless research papers being published every day, I felt overwhelmed to keep up to date with the latest research trends and insights. The old tricks I learned can only help so much.

Things start to change with the recent development of large language models (LLMs). Thanks to their remarkable contextual understanding capability, LLMs can fairly accurately identify relevant information from the user-provided documents and generate high-quality answers to the user's questions about the documents. A myriad of document Q&A tools have been developed based on this idea and some tools are designed specifically to assist researchers in understanding complex papers within a relatively short amount of time.

Although it's definitely a step forward, I noticed some friction points when using those tools. One of the main issues I had is prompt engineering. Since the quality of LLM responses depends heavily on the quality of my questions, I often found myself spending quite some time crafting the "perfect" question. This is especially challenging when reading papers in unfamiliar research fields: oftentimes I simply don't know what questions to ask.

This experience got me thinking: is it possible to develop a system that can automate the process of Q&A about research papers? A system that can distill key points from a paper more efficiently and autonomously?

Previously, I worked on a project where I developed a dual-chatbot system for language learning. The concept there was simple yet effective: by letting two chatbots chat in a user-specified foreign language, the user could learn the practical usage of the language by simply observing the conversation. The success of this project led me to an interesting thought: could a similar dual-chatbot system be useful for understanding research papers as well?

So, in this blog post, we are going to bring this idea to life. Specifically, we will walk through the process of developing a dual-chatbot system that can digest research papers in an autonomous manner.

To make this journey a fun experience, we are going to approach it as a software project and run a Sprint: we will begin with "ideation", where we introduce the concept of leveraging a dual-chatbot system to tackle our problem. Then comes the "Sprint execution", during which we'll incrementally build the features of our design. Lastly, we will show our demo in the "Sprint review" and reflect on the learnings and future opportunities in the "Sprint Retrospective".

Ready to run the Sprint? let's get started!

This is the 2nd blog on my series of LLM projects. The 1st one is Building an AI-Powered Language Learning App, and the 3rd one is Training Soft Skills in Data Science with Real-Life Simulations. Feel free to check them out!

Table of Content

· 1. Concept: dual-chatbot system · 2. Sprint Planning: what we want to build · 3. Feature 1: Document Embedding Engine · 4. Feature 2: Dual-Chatbot System ∘ 4.1 Abstract chatbot class ∘ 4.2 Journalist chatbot class ∘ 4.3 Author bot class ∘ 4.4 Quick test: the interview · 5. Feature 3: User Interaction ∘ 5.1 Creating the chat environment (in Jupyter Notebook) ∘ 5.2 Implementing PDF highlighting functionality ∘ 5.3 Allowing user input for questions ∘ 5.4 Allowing downloading the generated script · 6. Sprint Review: show the demo! · 7. Sprint Retrospective

1. Concept: dual-chatbot system

The foundation of our solution lies in the concept of a dual-chatbot system. As its name implies, this system involves two chatbots (powered by large language models) engaging in an autonomous dialogue. By specifying a high-level task description and assigning relevant roles to the chatbots, users can guide the conversation toward their desired direction.

To give a concrete example: in my previous project where a dual-chatbot is developed for assisting language learning, the learner (user) can specify a real-life scenario (e.g., dining at a restaurant) and assign roles for chatbots to play (e.g., bot 1 as the waitstaff and bot 2 as the customer), the two bots would then simulate a conversation in the user's chosen foreign language, mimicking the interaction between the assigned roles in the given scenario. This allows an on-demand generation of fresh, scenario-specific language learning materials, therefore helping users better understand language usage in real-life situations.

So, how do we adapt this concept for the autonomous digestion of research papers?

The key lies in the role assignment. More specifically, one bot could take the role of a "journalist", whose main task is to conduct an interview to understand and extract key insights from a research paper. Meanwhile, the other bot could play the role of an "author", who has full access to the research paper and is tasked with providing comprehensive answers to the "journalist" bot's queries.

When it comes to interaction, the journalist bot will initiate the dialogue and kicks off the interview process. The author bot will then serve as a conventional document Q&A engine and answer the journalist's questions based on the relevant context of the research paper. The journalist bot then follows up with additional questions for further clarification. Through this iterative Q&A process, the key contributions, methodology, and findings of the research paper could be automatically extracted.

An illustration of the workflow of the dual-chatbot system. (Image by author)

This dual-chatbot system described above introduces a shift from the traditional user-chatbot interaction: instead of users thinking about the right questions to ask the LLM model, the introduced "journalist" bot will automatically come up with suitable questions on the user's behalf. This approach could bypass the need for users to craft appropriate prompts, thus significantly reducing the users' cognitive load. This is especially useful when delving into unfamiliar research fields. Overall, the dual-chatbot system may constitute a more user-friendly, efficient, and engaging method for distilling complex scientific research papers.

Next up, let's move to Sprint planning and define several user stories we would like to address in this project.

2. Sprint Planning: what we want to build

With the concept in place, it's time to plan our current Sprint. In line with the common practice of Agile development, our Sprint planning will evolve around user stories.

In Agile development, a user story is a concise, informal, and simple description of a feature or functionality from an end-user perspective. It is a common practice used in Agile development to define and communicate requirements in a way that is understandable and actionable for the development team.