
Data Augmentation of Multivariate Sensor Time Series using
Autoregressive Models and Application to Failure Prognostics
Douglas Baptista de Souza
1
and Bruno Paes Leao
2
1
Siemens Advanta, 3850 Quadrangle Blvd, Orlando, FL, 32817, United States
douglas.de-souza@siemens.com
2
Siemens Technology, Princeton, NJ, 08540, United States
bruno.leao@siemens.com
ABSTRACT
This work presents a novel data augmentation solution for
non-stationary multivariate time series and its application to
failure prognostics. The method extends previous work from
the authors which is based on time-varying autoregressive
processes. It can be employed to extract key information
from a limited number of samples and generate new synthetic
samples in a way that potentially improves the performance
of PHM solutions. This is especially valuable in situations
of data scarcity which are very usual in PHM, especially for
failure prognostics. The proposed approach is tested based on
the CMAPSS dataset, commonly employed for prognostics
experiments and benchmarks. An AutoML approach from
PHM literature is employed for automating the design of the
prognostics solution. The empirical evaluation provides evi-
dence that the proposed method can substantially improve the
performance of PHM solutions.
1. INTRODUCTION
PHM has been an active topic in research and solution devel-
opment during the recent decades. The motivation is in as-
sociated with benefits such as reduced downtime, improved
yield and safety which can be enabled by failure diagnostics
and prognostics. Many related methods have been proposed
over the years, moving from traditional reliability-based prac-
tices based on population statistics to advanced data-driven
and physics-based solutions which can estimate current and
future health states associated with specific assets and failure
modes. Data-driven solutions benefit from the great advances
recently achieved in the field of machine learning to produce
accurate diagnostics and prognostics estimates. However, de-
spite the great advances in PHM methods, sufficient historical
data associated with the failure modes of interest is required
Douglas Baptista de Souza et al. This is an open-access article distributed
under the terms of the Creative Commons Attribution 3.0 United States Li-
cense, which permits unrestricted use, distribution, and reproduction in any
medium, provided the original author and source are credited.
to apply such methods in real world, and many times such
data is not available (Biggio & Kastanis, 2020; Kim, Choi,
& Kim, 2021). The underlying reason may be simply that
failures can be rare, and the justification of developing asso-
ciated PHM solutions comes from the high impact associated
with a potential occurrence. However, even in cases where
related failure events have happened a reasonable number of
times in the past, relevant sensor data associated with those
historical events may not have been collected or, if collected,
it may have quality issues or may not be associated with cor-
responding labels defining the actual failure modes.
Such challenges associated with the data naturally affect more
directly data-driven methods, but physics-based methods are
many times also impacted as failure mechanism models may
be very complex and depend on availability of historical data
for parameter tuning. Whenever enough data is not available,
despite all the advances of PHM methods, the development
of failure diagnostics or prognostics solution may result in
poor performance or may not even be feasible. In such cases,
the possibilities for PHM solution development are in general
limited to anomaly detection which is not as prescriptive or
actionable.
Given the potential limitations imposed by the lack of suffi-
cient good quality data, data-centric approaches can be valu-
able for enabling application of PHM solutions in the real
world (Leao, Fradkin, Lan, & Wang, 2021; Garan, Tidriri,
& Kovalenko, 2022). Data-centric in this context may be
related to improving data collection and labeling or making
the most out of available data. Data augmentation is one of
the most promising mechanisms for achieving the latter, hav-
ing gained increasing attention over the recent years in the
contexts of both failure diagnostics (Matei, Zhenirovskyy, de
Kleer, & Feldman, 2018; Kwak & Lee, 2023) and prognos-
tics (X. Y. Li, Cheng, Fang, Zhang, & Wang, 2024; Kim,
Kim, & Choi, 2020), and resulting in improved performance
in various PHM use cases (A. Yang et al., 2023; Wang,
1