
UNCLASSIFIED
UNCLASSIFIED
Abstract
Researchers with the Engineered Resilient Systems (ERS) program are engaged in multiple efforts to effectively
utilize large data sets collected from DoD platforms to apprise agencies of system performance, improve reliability
and availability, and inform future requirements. The foundational technology for this work is a High Performance
Computing (HPC)-based infrastructure that supports large data management – a data lake ecosystem. A data lake
is a repository of related data that is maintained in its original format. Any transformations performed on this data
result in a new pool of data on which analytics can be executed. The original and derived forms of data, together
with the supporting tools and technologies, comprise a data lake ecosystem. This ecosystem supports high
performance, parallel analysis of large data sets, and facilitates data provenance and access controls. Large-scale
data analytics projects include maintenance data analysis for reliability assessment, and model development for
impacting future design. For example, researchers are currently investigating the ability to infer the output of a
“virtual sensor” from an actual sensor that is in close proximity. This capability has two primary use cases: first, in
existing vehicles with standard sensor packages, one sensor could detect when another sensor is malfunctioning,
increasing safety and facilitating improved maintenance. Second, test data from prototypes could be used for
determining the minimum number and optimum placement of sensors, decreasing cost and operational weight.
Other efforts in this field include demonstrating cross-service applicability of machine learning models to
maintenance data for natural language processing and prediction capabilities, and using large data sets to create
surrogate models to replace computationally intense, long-running codes. The ability to effectively analyze
complete historic data sets also enables an accurate verification of algorithms that were previously developed on
information based on much smaller samples of data.
1