ULTRA: Foundation Models for Knowledge Graph Reasoning

Author:Murphy | View: 21774 | Time: 2025-03-23 12:14:20

What's new in Graph ML?

Training a single generic model for solving arbitrary datasets is always a dream for ML researchers, especially in the era of foundation models. While such dreams have been realized in perception domains like images or natural languages, whether they can be reproduced in reasoning domains (like graphs) remains an open challenge.

Image by Authors edited from the output of DALL-E 3.

In this blog post, we prove such a generic reasoning model exists, at least for knowledge graphs (KGs). We create ULTRA, a single pre-trained reasoning model that generalizes to new KGs of arbitrary entity and relation vocabularies, which serves as a default solution for any KG reasoning problem.

_This post is based on our recent paper (preprint) and was written together with [Xinyu](https://twitter.com/XinyuYuan402) Yuan (Mila), [Zhaocheng](https://twitter.com/zhu_zhaocheng) Zhu (Mila), and [Bruno](https://twitter.com/brunofmr) Ribeiro (Purdue / Stanford). Follow Michael, Xinyu, Zhaocheng, and Bruno on Twitter for more Graph ML content._

Outline

Why KG representation learning is stuck in 2018
Theory: What makes a model inductive and transferable?
Theory: Equivariance in multi-relational graphs
ULTRA: A Foundation Model for KG Reasoning
Experiments: Best even in the zero-shot inference, Scaling behavior
Code, Data, Checkpoints

Why KG representation learning is stuck in 2018

The pretrain-finetune paradigm has been with us since 2018 when ELMo and ULMFit showed first promising results and they were later cemented with BERT and GPT.

In the era of large language models (LLM) and more general foundation models (FMs), we often have a single model (like GPT-4 or Llama-2) pre-trained on enormous amounts of data and capable of performing a sheer variety of language tasks in the zero-shot manner (or at least be fine-tuned on the specific dataset). These days, multimodal FMs even support language, vision, audio, and other modalities in the same one model.

Things work a little differently in Graph ML. Particularly, what's up with representation learning on KGs at the end of 2023? The main tasks here are edge-level:

Entity prediction (or knowledge graph completion) (h,r,?): given a head node and relation, rank all nodes in the graph that can potentially be true tails.
Relation prediction (h,?,t): given two nodes, predict a relation type between them

Turns out, up until now it has been somewhere in pre-2018. The key problem is:

Each KG has its own set of entities and relations, there is no single pre-trained model that would transfer to any graph.

For example, if we look at Freebase (a KG behind Google Knowledge Graph) and Wikidata (the largest open-source KG), they have absolutely different sets of entities (86M vs 100M) and relations (1500 vs 6000). Is there any hope for current KG representation learning methods to be trained on one graph and transfer to another?

Different vocabularies of Freebase and Wikidata. Image by Authors.

❌ Classical transductive methods like TransE, ComplEx, RotatE, and hundreds of other embedding-based methods learn a fixed set of entities and relation types from the training graph and cannot even support new nodes added to the same graph. Shallow embedding-based methods do not transfer (in fact, we believe there is no point in developing such methods anymore except for some student project exercises).

Tags: Artificial Intelligence Graph Machine Learning Knowledge Graph Machine Learning Thoughts And Theory