Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

Liang Zeng; Xinyi Lin; Shanping Yu

doi:10.1016/j.inffus.2025.103783

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

Liang Zeng, Xinyi Lin, Shanping Yu^*

^*Corresponding author for this work

School of Cyberspace Science and Technology

Research output: Contribution to journal › Article › peer-review

Abstract

Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.

Original language	English
Article number	103783
Journal	Information Fusion
Volume	127
DOIs	http://doi.org/10.1016/j.inffus.2025.103783
Publication status	Published - Mar 2026

Keywords

Chain of thought
Generative artificial intelligence
Hallucination
Knowledge graph
Reinforcement learning

Access to Document

10.1016/j.inffus.2025.103783

Cite this

@article{c3ad8e98e7174956a593c180a770460e,

title = "Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning",

abstract = "Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.",

keywords = "Chain of thought, Generative artificial intelligence, Hallucination, Knowledge graph, Reinforcement learning",

author = "Liang Zeng and Xinyi Lin and Shanping Yu",

note = "Publisher Copyright: {\textcopyright} 2025 Elsevier B.V.",

year = "2026",

month = mar,

doi = "10.1016/j.inffus.2025.103783",

language = "English",

volume = "127",

journal = "Information Fusion",

issn = "1566-2535",

publisher = "Elsevier B.V.",

}

TY - JOUR

T1 - Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

AU - Zeng, Liang

AU - Lin, Xinyi

AU - Yu, Shanping

PY - 2026/3

Y1 - 2026/3

N2 - Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.

AB - Multimodal large models exhibit remarkable capabilities in understanding and generating content by integrating diverse types of data, including text and images. However, they face significant challenges related to hallucination in practical applications, where generated content may be inaccurate or misleading. To address these concerns, this study introduces a chain of thought framework for trusted content generation based on knowledge graph reinforcement learning to mitigate hallucinations effectively. This framework incorporates a chain of thought mechanism to enhance model reasoning, thereby improving interpretability. By leveraging a external structured knowledge graph, the framework optimizes the trajectory of the generated content, ensuring that outputs are informed by reliable contextual information. Furthermore, the use of reinforcement learning techniques bolsters the credibility of the generated responses. Experimental evaluations on the VQA-RAD and SLAKE datasets demonstrate that this approach achieves significant improvements in medical visual question answering tasks. This framework not only elevates the quality of content generation but also enhances the interpretability of the model.

KW - Chain of thought

KW - Generative artificial intelligence

KW - Hallucination

KW - Knowledge graph

KW - Reinforcement learning

UR - http://www.scopus.com/pages/publications/105017804165

U2 - 10.1016/j.inffus.2025.103783

DO - 10.1016/j.inffus.2025.103783

M3 - Article

AN - SCOPUS:105017804165

SN - 1566-2535

VL - 127

JO - Information Fusion

JF - Information Fusion

M1 - 103783

ER -

Hallucination-resistant multimodal content generation through knowledge graph-based reinforcement learning

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this