1
AI+AR based Framework for Guided Visual Equipment Diagnosis
Teresa Gonzalez Diaz
1
, Xian Yeow Lee
1
, Huimin Zhuge
1
, Lasitha Vidyaratne
1
, Gregory Sin
1
, Tsubasa Watanabe
2
, Ahmed
Farahat
1
, Chetan Gupta
1
1
Hitachi R&D America, Santa Clara, Ca, 95054, USA
teresa.gonzalezdiaz@hal.hitachi.com
xian.lee@hal.hitachi.com
joy.zhuge@hal.hitachi.com
lasitha.vidyaratne@hal.hitachi.com
gregory.sin@hal.hitachi.com
ahmed.faharat@hal.hitachi.com
chetan.gupta@hal.hitachi.com
2
Hitachi R&D America, Holland, Mi, 49424, USA
tsubata.watanabe@hal.hitachi.com
ABSTRACT
Automated solutions for effective support services, such as
failure diagnosis and repair, are crucial to keep customer
satisfaction and loyalty. However, providing consistent, high
quality, and timely support is a difficult task. In practice,
customer support usually requires technicians to perform
onsite diagnosis, but service quality is often adversely
affected by limited expert technicians, high turnover, and
minimal automated tools. To address these challenges, we
present a novel solution framework for aiding technicians in
performing visual equipment diagnosis. We envision a
workflow where the technician reports a failure and prompts
the system to automatically generate a diagnostic plan that
includes parts, areas of interest, and necessary tasks. The plan
is used to guide the technician with augmented reality (AR),
while a perception module analyzes and tracks the
technician’s actions to recommend next steps. Our
framework consists of three components: planning, tracking,
and guiding. The planning component automates the creation
of a diagnostic plan by querying a knowledge graph (KG).
We propose to leverage Large Language Models (LLMs) for
the construction of the KG to accelerate the extraction
process of parts, tasks, and relations from manuals. The
tracking component enhances 3D detections by using
perception sensors with a 2D nested object detection model.
Finally, the guiding component reduces process complexity
for technicians by combining 2D models and AR
interactions. To validate the framework, we performed
multiple studies to:1) determine an effective prompt method
for the LLM to construct the KG; 2) demonstrate benefits of
our 2D nested object model combined with AR model.
1. INTRODUCTION
Offering support services has become a key differentiator for
customer satisfaction and retention in multiple industries. For
example, manufacturers provide products along with support
services and warranties to ensure that machines’ downtime is
minimized. However, operational complexities hinder the
overall quality of services, such as limited experienced
technicians, high turnover, steep learning curves of the
manuals and few automated tools. Therefore, it is essential to
develop automated methods and systems for technicians’
assistance that aim to high standards of support services.
Building intelligent assistant systems present important
technical challenges. First, knowledge bases are required to
provide reasoning and extensibility, but traditional methods
require extensive data and labels. Second, scene
understanding is critical to guarantee the quality of visual
guidance, but existing methods are not sufficient for
environment variations of customer sites. Third, advanced
user interfaces are required to be intuitive and useful, but
Augmented Reality (AR) with 3D methods, though enabling
rich human interactions, are slow with limited generalization.
To tackle these challenges, we proposed a novel general
framework for guided visual diagnosis. In our approach, the
system assists technicians in their tasks, irrespective of their
experience level and the complexity of the issues they
encounter. The framework integrates methods that facilitate
an automated, interactive and user-friendly approach.
In summary, our approach comprises the following
contributions:
1. A novel general framework designed to automate the
visual diagnosis process enabled by methods for
diagnostic plan generation, tracking and AR guidance.
Teresa Gonzalez Diaz et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution 3.0 United States
License, which permits unrestricted use, distribution, and reproduction in
any medium, provided the original author and source are credited.