Physics and Data Collaborative Root Cause Analysis: Integrating
Pretrained Large Language Models and Data-Driven AI for
Trustworthy Asset Health Management
Hao Huang, Tapan Shah, John Karigiannis, Scott Evans
GE Vernova Advanced Research, Niskayuna, NY, USA
hao.huang1@ge.com, tapan.shah@ge.com, John.Karigiannis@ge.com, evans@ge.com
ABSTRACT
Data-driven tools for asset health management face signifi-
cant challenges, including a lack of understanding of physi-
cal principles, difficulty incorporating domain experts’ expe-
riences, and consequently low detection accuracy, leading to
trustworthiness issues. Automatically integrating data-driven
analysis with human knowledge and experience, as found in
literature and maintenance logs, is critically needed. Recent
progress in large language models (LLMs) offers opportuni-
ties to achieve this goal. However, there is still a lack of work
that effectively combines pretrained LLMs with data-driven
models for asset health management using industrial time se-
ries data as input. This paper presents a framework that inte-
grates our recently proposed data-driven AI with pretrained
LLMs to address root cause detection in industrial failure
analysis. The framework employs LLMs to analyze outputs
from our data-driven root cause analysis models, filtering out
less relevant results and prioritizing those that align closely
with physical principles and domain expertise. Our innova-
tive approach leverages advanced data-driven analytics and a
multi-LLM debate for collaborative decision-making, seam-
lessly merging data-driven insights with domain knowledge.
Specifically, through our proposed self-exclusionary debates
among multiple LLMs, biases inherent in single-LLM sys-
tems are effectively mitigated, enhancing reliability and sta-
bility. Crucially, the framework bridges the gap between data-
driven models and physics-informed LLMs, accelerating the
interaction between data and knowledge for more informed
and realistic decision-making processes.
1. INTRODUCTION
In asset health management, fault detection and root cause
analysis (RCA) is the process of identifying and diagnosing
anomalies or malfunctions in equipment or processes to pre-
vent failures and maintain efficiency. Specifically, RCA com-
Hao Huang et al. This is an open-access article distributed under the terms of
the Creative Commons Attribution 3.0 United States License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the
original author and source are credited.
plements fault detection by uncovering fundamental causes,
enabling targeted corrective actions and facilitating predic-
tive maintenance strategies Ellefsen et al. (2019); Liao & Ahn
(2016). In recent years, the surge in sensor technologies has
led to an unprecedented volume of time series data across di-
verse sectors, presenting both opportunities and challenges.
Artificial Intelligence (AI) models, especially those designed
to operate on time series data, have become crucial for au-
tonomously identifying the underlying root causes of failures.
Figure 1. A malfunctioning cooling system results in high
pressure and temperature in cooling rods. However, identify-
ing the most deviated features may only reveal downstream
effects rather than direct causes.
Conventional data-driven root cause analysis often rely on
identifying deviations through reconstruction or prediction,
leveraging AI techniques such as autoencoders or LSTM net-
works Pang & Aggarwal (2021); Park et al. (2019); Xiao et
al. (2023). However, pinpointing the most deviated chan-
nels might emphasize downstream effects rather than direct
causes. For example, consider the scenario depicted in Figure
1: suppose there’s a malfunction in the cooling system, lead-
ing to spikes in pressure, temperature, and flow rate within
the control rods. Conventional approaches might flag these
elevated readings as the primary cause due to their substan-
tial residuals in prediction. However, in reality, they serve
more as indications of the underlying cooling system problem
rather than being the immediate cause of the malfunctions.
1