Optimizing Retrieval-Augmented Generation (RAG) by Selective Knowledge Graph Conditioning

Author:Murphy | View: 25859 | Time: 2025-03-22 23:35:10

Artificial intelligence software was used to enhance the grammar, flow, and readability of this article's text.

Generative pre-trained models have shown impressive fluency and coherence when used for dialogue agents. However, a key limitation they suffer from is the lack of grounding in external knowledge. Left to their pre-trained parameters alone, these models often generate plausible-sounding but factually incorrect responses, also known as hallucinations.

Prior approaches to mitigate this have involved augmenting the dialogue context with entire knowledge graphs associated with entities mentioned in the chat. However, this indiscriminate conditioning on large knowledge graphs brings its own problems:

Limitations of Naive Knowledge Graph Augmentation:

Much of the 1-hop context may be irrelevant to the dialogue, inserting unnecessary noise
Encoding entire knowledge subgraphs strains sequence length limits
No guarantee model will use the relevant facts for generation
Risk of hallucination still exists despite knowledge grounding

To overcome this, Kang et al. 2023 propose the SUbgraph Retrieval-augmented GEneration (SURGE) framework, with three key innovations:

Knowledge-Consistent Dialogue Generation with Language Models and…

Context-Relevant Subgraph Retriever: Retrieving the most relevant knowledge graph facts to the dialogue context using a graph neural network retriever.
Efficient Graph Encoding: Perturbing token embeddings based on relations while encoding just subgraph entities instead of all triplets. Maintains permutation and inversion invariance.
Graph-Text Contrastive Learning: Ensuring consistency between retrieved knowledge graph and generated response via contrastive loss.

This allows providing precisely the requisite factual context to the dialogue without dilution from irrelevant facts or model limitations. Experiments show SURGE reduces hallucination and improves grounding.

The key insight is that selective conditioning on personalized subgraphs provides focused knowledge grounding without overwhelming pre-trained models.

Plan :

– Context-Relevant Knowledge Retrieval;

– Invariant Knowledge Encoding;

– Enforcing Knowledge Consistency;

– Results;

– Conclusion.

Context-Relevant Knowledge Retrieval:

Retrieval distribution modeled using similarity of context and triplet embeddings
Triplet embeddings obtained from Graph Neural Networks to capture relational structure
Enables focusing on most relevant facts instead of all knowledge graph facts

The key challenge SURGE addresses is retrieving only the most relevant facts from the knowledge graph rather than overwhelm the generator with all contextually associated entities. To enable this context-specific selection, the paper proposes modeling the retrieval as a distribution over knowledge graph triplets conditioned on the dialogue history.

Mathematically, this context-conditional retrieval distribution is defined as:

pφ(z|x) ∝ exp(d(z)^T s(x))

Where:

x is the dialogue context
z is a knowledge graph triplet
s(x) generates dense embeddings for the dialogue context
d(z) generates dense embeddings for the triplets

The key insight here is using the similarity between dialogue and triplet embeddings to model relevance.

Since the triplets contain both entities and relations structured as a graph, plain language model encoders are insufficient. Instead, Graph Neural Networks (GNNs) are uniquely positioned to capture both nodes and edges. GNNs can represent the relational dependencies between entities by propagating neighbouring embeddings.

Specifically, node embeddings are generated using Graph Convolutional Networks:

e = GNN(e0; G)

While relation embeddings use Edge Hypergraph Networks:

r = GNN(r0; G∗)

Where G* denotes the dual hypergraph.

By combining node and edge embeddings, full triplet embeddings can embed semantic relations and proximity. The similarity of these triplets with the dialogue context vectors from the encoder then provides the foundation for a context-relevant retrieval distribution.

Invariant Knowledge Encoding:

Encodes retrieved subgraph into generator transformer efficiently
Ensures encoding is invariant to order and direction of relations
Uniquely encodes entities and perturbs embeddings based on relations

The context-relevant subgraph retrieved in the previous stage needs to be encoded into the generator transformer model that will produce the dialogue response. However, naively encoding the symbolic triplets runs into issues around stability of the representations.

Specifically, there are two desired invariance properties:

Permutation invariance: Order of triplets should not change overall meaning
Relation inversion invariance: Forward and backward relations equivalent

When encoding knowledge graphs into pre-trained language models for dialogue, there are a couple practical problems that come up:

Long sequences: Encoding every single triplet fact as words results in extremely long input sequences. This strains the model's context capacity.
Order dependence: Shuffling the order of triplets changes the meaning seen by models like GPT-3, since they rely so much on word order and positioning. But triplets are by nature unordered – shuffling facts shouldn't change overall meaning.
Directional difference: Relations can be inverted without changing the core meaning (X is-wife-of Y == Y has-husband X). But prefixed text makes these seem like completely different facts.

The problems above cause unnecessary stress on the language models when encoding structured knowledge. The models get overwhelmed by huge numbers of tokens, and they struggle to grasp that jumbled or inverted triplets still convey the same concepts.

So ideally, we need a way to encode knowledge compactly yet stably. The encoding should be:

Efficient: Shouldn't result in 1000s of prepended tokens blowing context space.
Order-invariant: Shuffling subgraphs shouldn't drastically alter meaning.
Direction-invariant: Forward and backward relations should be treated equivalently.

SURGE solves this by uniquely encoding only entities, then judiciously perturbing their embeddings based on relations detected via graph neural networks. This provides a compact, stable form for assimilation by the decoder.

A two-step embed and perturb approach is introduced:

Unique entity embedding:

Extract set of unique entities ENT(Z) from triplets
Embed these entities using dialogue encoder
This embed+sort provides permutation invariance

Perturbation using relations:

Use Graph Neural Network over triplets
GNN provides relation-aware node embeddings
Apply transformation β to entity embeddings:

β(f(a), Z) = (1 + γ) ∗ f(a) + δ

where γ, δ are learned perturbation factors based on relations.

This step uses the relational information to directly influence the entity vector spaces while still keeping the efficient unique entity based encoding.

Benefits:

Vector space encoding fits generator requirements
Invariance provides stability and consistency

The insight is generating invariance through sets and perturbations rather than variable sequence encodings.

Enforcing Knowledge Consistency:

Contrastive loss between knowledge graph and generated response
Pulls relevant knowledge representations closer to response representations
Improves grounding of responses in retrieved facts

Even after context-relevant retrieval and efficient encoding, there is no guarantee the generator will actually utilize the relevant knowledge provided to it. The risk of hallucination persists.

To actively incorporate the encoded subgraph, the authors propose adding a cross-modal contrastive loss between graph and response representations:

Lcont = (1/2) * log (sim(ζ(z), ξ(h)) / ∑ξ(h'))

(1/2) * log (sim(ζ(z), ξ(h)) / ∑ζ(z'))

Where:

z is the encoded knowledge subgraph
h is the decoder hidden state
ζ and ξ are projected embeddings

Intuitively, this loss pulls an encoded knowledge graph closer to its corresponding response representation, while pushing it away from other random responses or knowledge graphs.

This trains the model to actively distinguish between relevant knowledge-response pairs versus irrelevant ones. This discriminative pressure incentivizes the model to ground its responses in the encoded facts.

Benefits:

Improves factual consistency
Reduces unsupported assertions
Allows tracing hallucinations to retrieval errors

The key insight is that without an explicit alignment objective, the vector spaces of both modalities may drift apart, limiting fact grounding. The contrastive loss acts as an inductive bias towards consistency.

Training end to end :

Objective Function: The overall training objective is to maximize the log likelihood of generating the correct responses summed over the latent knowledge subgraphs:

L = Σp(Z|x) p(y|x,Z)

Where p(Z|x) is the context-based retrieval distribution and p(y|x,Z) is the generator distribution.

Training Process:

Encode dialogue context x using encoder network
Retrieve top-k subgraphs Z_i ~ p(Z|x) via similarity search
Encode Z_i invariantly using GNN + perturbation
Maximize p(y|x,Z_i) for each sample via decoder
Additionally minimize contrastive loss between Z_i and decoder states

So jointly across batches of dialogue, retrieval distribution and generation distribution are optimized through shared parameters.

Model Choice:

In principle, any sequence-to-sequence language model like T5, BART or even GPT-3 can be used as the generator model by appending encoded knowledge to the input context tokens. The paper uses a T5 model in their experiments but this can be substituted.

Benefits:

Unified end-to-end training tying components
Marginal likelihood aggregates overall retina performance
Modular architecture allows model extensibility

Results:

Outperforms baselines in metrics measuring knowledge relevance
Qualitative examples show more factual responses grounded in relevant knowledge
Ablations validate importance of each component

The authors evaluate SURGE on the OpendialKG and KOMODIS dialogue datasets which provide paired knowledge graphs.

Quantitative improvements:

SURGE outperforms all baselines in knowledge-relevance metrics like the proposed KQA (Knowledge-Verifying QA) metric which measures factual correctness through an extractor.
Achieves new state-of-the-art results on existing automatic metrics like BLEU, ROUGE and F1 which assess language fluency.

Qualitative impacts:

Examples show SURGE generates more informative, factual responses grounded in relevant knowledge from selectively retrieved subgraphs.
Baselines often omit key facts or even hallucinate irrelevant statements despite having access to the full context.

Ablation studies:

Removing components like contrastive learning significantly drops knowledge consistency metrics, showing the necessity of each module.

SURGE substantially improves knowledge relevance through targeted augmentation while retaining language fluency. The gains over both knowledge-unaware and knowledge-intensive baselines validate the benefits of selective subgraph retrieval and grounding.

Tags: AI Data Science Deep Learning Machine Learning Rags