Chat with your data: Unternehmensdaten als Basis für einen eigenen KI-Assistenten nutzen.
Zum Angebot 

How to Leverage Knowledge Graphs in Question Answering?

17 ​​min

Knowledge Graphs (KGs) have emerged as valuable repositories of world knowledge. The integration of KGs with Natural Language Processing (NLP) models has opened new avenues for applications such as question answering, text classification, text generation, and machine translation, leading to remarkable performance gains. In this blog post, we explore the methodologies for integrating KGs into Question-Answering (QA) models.

We begin with an introduction to knowledge graphs. Next, we summarize existing methods for integrating KGs into NLP models. Following that, we explain the task of question answering and introduce our novel KG-enhanced model for generative QA that utilizes Graph Neural Networks (GNNs). Finally, we conclude the article with some key insights from our study.

Knowledge Graphs

Knowledge graphs are powerful representations of information in graph-like structures, serving as rich, interconnected knowledge representations. A knowledge graph consists of nodes and edges. Nodes symbolize entities, which could be real-world objects, individuals, or abstract concepts. Edges denote relationships between entities.

Knowledge Graph with nodes that are connected by edges
An example of a knowledge graph. Credit:

For example, in a social network graph, nodes signify people and edges represent relationships like spouse, parent, or friend. A node can also represent a broader category of other nodes, for instance, all individuals have an “is a“ relationship with the “person“ node, allowing KGs to capture hierarchical relationships between entities. The flexibility of KGs allows them to represent a wide range of information from basic statements like “Berlin is the capital city of Germany“ to more intricate, qualified assertions such as “all capitals are cities.“ A knowledge graph is often represented as a set of triples (h, r, t) where h, r, and t represent the head entity, relation, and tail entity, respectively. For example, (Berlin, capital city of, Germany), (Berlin, is a, city), (Steve Jobs, former CEO of, Apple), and (Steve Jobs, is a, person).

Representing information as a knowledge graph offers several advantages. One key advantage is the graph’s inherent ability to capture complex relationships between concepts, providing a rich and structured representation of knowledge. Another benefit is flexibility and extensibility. KGs allow systematic integration of new data sources into the existing graph. This enables KGs to grow and evolve alongside expanding knowledge bases and changing requirements. Moreover, knowledge graphs can also efficiently integrate diverse types of data, including text, images, audio, videos, and ontologies, into a unified and cohesive representation. Additionally, graph-based representations facilitate efficient querying, reasoning, advanced analytics, and the derivation of deeper insights from the data.

One of the frequently used knowledge graphs in NLP research is Wikidata, a large-scale general domain KG. Wikidata covers a wide range of knowledge, from famous people, events, music, and movies, to chemical substances. Wikidata has proven beneficial for open-domain QA. Another prominent KG in NLP research is Conceptnet, which focuses on common sense knowledge. Conceptnet is found to be useful for common sense reasoning tasks.

Methods for Integrating KG into NLP Models

In recent years, there has been a growing effort to leverage the structured and semantically rich nature of knowledge graphs to enhance the capabilities of NLP models, especially pre-training language models (LM) and QA systems. KGs are regarded as valuable sources of factual knowledge which is crucial for QA tasks. Effectively integrating KG knowledge gives rise to two challenges. Firstly, how to identify relevant knowledge to extract from the KG. Secondly, how to find the optimal approach to inject this knowledge into NLP systems to maximize its potential while maintaining efficiency. Common types of knowledge extracted from KGs are entities, triples, and subgraphs. In the pre-train and fine-tune framework, KG knowledge can be integrated during the pre-training phase of a language model, the fine-tuning phase, or both phases. Methods for knowledge fusion are, for instance, converting KG knowledge into text (KG-to-text), designing KG-aware pre-training tasks for the language model, and incorporating a knowledge module into the NLP model.

1. KG-to-Text

An illustration of the kg-to-text approach

The most straightforward strategy to leverage KG knowledge is to convert KG knowledge (e.g., subgraphs or triples) into natural language sentences and use them as supplementary contextual information. This conversion can be achieved through various techniques, including template-based methods, pre-training or fine-tuning language models on KG-to-text datasets, and using GNNs as graph encoders.

Note: The illustration is taken from Agarwal et al. 2021 and has been modified.

2. Designing KG-aware pre-training tasks

Given the widespread use of pre-trained language models in NLP, one knowledge fusion strategy is to pre-train the language models not only on text corpora but also on knowledge graphs. Typical approaches involve using entity-aware and relation-aware pre-training objectives in conjunction with the language model’s original objectives, such as Masked Language Modeling (MLM). The goal is to map knowledge from both text corpora and knowledge graphs into the same embedding space during pre-training.

A popular entity-aware objective involves predicting masked entity spans. Another objective is the entity linking task. Given a specific token span, the model is asked to predict the corresponding KG entity.

Masked Entity Spans: Entity spans arrow to the Language Model that arrows to words
The illustration depicts an entity span prediction task, where the model aims to predict the masked entity span.

Another effective approach is the replacement-detection objective. Here, entity spans in the text are replaced with random entities, and the model learns to identify whether the entity spans have been replaced.

Overview of replacement-detection objective: entity spans arrow to language model that arrows to checks or crosses
The illustration depicts the entity replacement prediction task, where the model predicts whether a token span (corresponding to an entity) has been replaced by another token span (an incorrect entity, here „Steven Spielberg“).

Relation-aware objectives involve, for example, predicting the tail entity given the head entity and the relation. While KG-aware pre-training has the drawback of being computationally expensive and time-consuming, this approach enables the language model to gain a deep understanding of both textual and knowledge-based information.

Note: The illustrations are taken from He et al. 2020 and have been modified.

3. Build a knowledge module into the model

A knowledge module aims to provide KG knowledge to the primary NLP model. This module can be modeled as an ad-hoc component separately from the primary model or can be integrated into the model. For instance, by inserting it between the model’s layers, or by utilizing it in the calculation of attention scores.

Buttons of Language Model and Knowledge Graph point to knowledge integration module points to prediction
Building a knowledge module on top of the pre-trained LM

Graph: Language Model Layer and seperate knowledge graph point to Knowledge integration Layer that points to language model layer
Inserting a knowledge module between LM’s layers

Question Answering

Question answering is an NLP task in which the system is given a question in natural language and asked to provide an answer to the question. QA systems have attracted a lot of attention in both research and industry due to their ability to facilitate information search. There is a wide range of applications in real-world scenarios, from supporting virtual assistants such as Siri and Google Assistant to improving customer support in the form of QA chatbots.

QA systems can be categorized into open-book and close-book settings. In the open-book setting, QA systems are allowed to access external information to support the response process, for example, by retrieving data from databases or websites. In a close-book system, the QA system cannot access external sources of information. QA systems that operate in a close-book system tend to be very large and require a tedious training procedure because they need to contain all knowledge within their parameters. A good example of such a system is ChatGPT. Nowadays, the “retriever-reader“ architecture is considered one of the most efficient and promising approaches. The retriever’s task is to find relevant documents for a given question. The reader is responsible for deriving the final answer from the retrieved documents.

The proposed KG-enhances QA model

Our proposed model adopts the third method of knowledge fusion, which involves using a separate knowledge module with the QA model where knowledge fusion happens in the input layer of the QA model. The selected form of knowledge is subgraph, and the fusion step takes place during the fine-tuning stage. The model consists of two main components: the QA module and the GNN module.

The proposed KG-enhances QA model
The architecture of our knowledge graph enhanced QA model. z denotes the question node and p denotes the pool graph vector. A token sequence consists of a question and a passage.

The QA module is built on a generative QA model called Fusion in Decoder (FiD) proposed by Izacard and Grave 2020 at Facebook AI Research. FiD is an encoder-decoder model based on the pre-trained T5 model. Its strength lies in its ability to handle a large number of supporting passages (up to 100 passages). Our GNN module is derived from a powerful GNN-based reasoning module of another QA model, called QA-GNN. QA-GNN uses KG subgraphs as a knowledge source and encodes them using a Graph Attention Network (GAT) to improve the reasoning ability of the model. GAT is a variation of GNN that incorporates the attention mechanism. The authors modified GAT so that it can incorporate information from the relationships and node scores. They also proposed a method to assign a relevance score to each node, which helps in selecting important nodes for the subgraphs. However, QA-GNN is only applicable for multi-choice QA scenarios. Recognizing the potential benefit of the GNN module in QA-GNN, we wished to expand its applicability to generative QA.

Our model can be regarded as a reader system since we provide the model with support passages. The original FiD model encodes one passage at a time, then concatenates all encoded passages into a long vector sequence and passes it to the decoder to generate the answer. To incorporate KG knowledge, we simply combine KG embeddings with token embeddings into the same sequence at the input layer. Specifically, for each question, we first create a subgraph based on KG entities found in the question and all passages. Similar to QA-GNN, we add a special node into the subgraph which represents the question, and connect it to other nodes through a newly added relation. We also expand the subgraph by adding some relevant 1-hop neighboring nodes that form a path within the subgraph. This results in a fully connected subgraph. We initialize nodes in the subgraphs with pre-trained embeddings. After that, we apply five rounds of GNN. The message-passing process of GNN is supposed to allow the entities to interact with the question and learn about the relationship between them. Then, we take the learned node embeddings, the pooled graph embeddings, and the question node embeddings, and concatenate them with the token embeddings of the question and a passage. The token embeddings are obtained through the embedding layer of the pre-trained T5 encoder. While FiD only uses token embeddings as input, we use token embeddings and also KG embeddings.

Our idea is to allow token and KG entities to exchange information in the encoder through multiple layers of attention. After encoding, we obtain a sequence of embeddings (vectors) for each passage. To reduce the computing time and memory space in the decoder, we remove the embeddings associated with the subgraph and keep only the token embeddings.  After doing the same for all passages, we concatenate the remaining token embeddings from each passage together, resulting in a long vector sequence. Finally, we pass it as input to the decoder to generate the answer.

One point to note about this architecture is that we put token and KG embeddings in the same sequence. This usually requires an alignment to transform KG embeddings into the same space as token embeddings. In our work, the message-passing in GNN serves as an alignment procedure since the subgraph consists of both KG embeddings and LM embeddings through the added question node. After applying GNN, we further align the embeddings using a linear transformation.


We evaluate the model on a small subset of TriviaQA and the GQA (Graph Question-Answering) dataset. GQA is a private QA dataset that provides high-quality Wikidata subgraphs for each question. Both datasets cover general topics (open domain). We experiment with two knowledge graphs, Wikidata5m and Conceptnet.

An example of the TriviaQA dataset

Question: The Dodecanese Campaign of WWII that was an attempt by the Allied forces to capture islands in the Aegean Sea was the inspiration for which acclaimed 1961 commando film?

Answer: The Guns of Navarone

Passage: The Dodecanese Campaign of World War II was an attempt by Allied forces to capture the Italian-held Dodecanese islands in the Aegean Sea following the surrender of Italy in September 1943, and use them as bases against the German-controlled Balkans. The failed campaign, and in particular the Battle of Leros, inspired the 1957 novel The Guns of Navarone and the successful 1961 movie of the same name.

An example of the GQA dataset

Question: What American Mexican cinematographer did Brokeback Mountain star
Answer: Rodrigo Prieto


In all experiments, our model outperforms the baseline which is the original FiD model. Overall, the model shows an improvement gain of 0.06 to 1.44% in Exact Match scores (EM). The performance gain on TriviaQA is not significant, while it is significant on GQA. We find that our model is most beneficial in a few-shot setting when we train the model with 10 samples (+1.44% EM).

Although the model can provide a certain performance gain, it comes with longer processing times that are nearly double when compared to the baseline FiD. This is possibly due to the input sequence becoming longer because of the added KG embeddings, and the computation time for self-attention in the encoder growing quadratically with the sequence length. Another factor is the need to run the GNN on the fly for each question. The computation time scales with the subgraph’s size, the dimension of the node embedding, and the number of GNN layers. In addition, the construction of subgraphs is the most time-consuming and memory-intensive step. This step involves identifying KG entities in the input text and computing a relevant score for the entities. Entity linking requires careful configuration because words in the text can be ambiguous and a KG entity can have many aliases. For a large KG like Wikidata5m, the entity linking process consumes a significant amount of time. This was the reason why we could not use the whole TriviaQA dataset, but decided to work with a subset instead. Similarly, the node scoring demands substantial processing time as we extend the subgraph to neighboring nodes. It requires us to calculate a score for all child nodes of the initial subgraph before we can rank and choose the relevant nodes to build the final subgraphs. In the case of Wikidata5m, some parent nodes can have more than 3,000 child nodes.


The rising popularity of knowledge graphs in NLP research can be attributed to their ability to represent information in a structured way, preserving the connectivity of real-world entities and concepts. Moreover, knowledge graphs can effectively integrate diverse data modalities and sources into a unified representation and can be systematically extended and updated. As the era of big data evolves, with information constantly changing, the need for reliable, fact-aware NLP systems becomes crucial. In this context, knowledge graphs have the potential to serve as dynamic sources of factual knowledge, enhancing accuracy and contextual understanding in NLP tasks.

In this blog post, we presented different techniques for integrating knowledge graphs into NLP models with a focus on question-answering tasks. To gain a practical understanding, we developed a KG-enhanced question-answering model based on a simple approach. The model exhibited performance improvements in both fully supervised and few-shot scenarios across two KGs and two QA datasets. However, it has limitations in terms of efficiency.

While knowledge graphs are attractive sources of valuable information, integrating them into NLP models is challenging due to the need to optimize various interconnected components. For instance, within our model architecture, we had to look for effective methods for identifying KG entities in the input text, accurately estimating node relevance, finding the optimal strategy for constructing subgraphs and designing a suitable knowledge fusion approach.

Hat dir der Beitrag gefallen?

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert