NON-PEER REVIEW
20th Australian International Aerospace Congress
ISBN number: 978-1-925627-66-4
Please select category below:
Normal Paper
Student Paper
Young Engineer Paper
Natural Language Processing for Identification of Ground
Truth Events in Data Curation
Nathaniel C Rigoni
1
Data Analytics Innovations, Engineering Technologies Systems of Systems, Rotorcraft Mission Systems, Lockheed Martin
Abstract
Identification of ground truth events surrounding the failure and removal of parts in the field
involves a time-consuming process of reading through thousands of maintenance entries in order
to find the correct entry containing the part and failure mode targets for research and modelling in
condition based maintenance or usage based lifing. This paper identifies and explores a method of
reducing the effort needed to find ground truth events in maintenance data. This data is used to
identify flights relevant to the wear and use of a part being studied for modelling. The process used
is a neural network that models the free and categorical language used in the maintenance entries.
This model creates an n-dimensional embedding of entries which can be compared to identify
similarity of the entries or to compare the similarity of entries to search terms. Creating document
embeddings of maintenance entries enables users to intelligently search their data and vastly
reduces the time involved in labelling ground truth. The embeddings are robust and compensate
for misspellings, acronyms, synonyms, and lack of use of particular words. This method
completely replaces the use of exact match text searching for data curation. Natural language
processing reduces the overall cost of modelling usage based lifing and sensor events by reducing
time between raw data to curated dataset in ground truth investigation.
Keywords: Machine learning, Condition Based Maintenance, Usage Based Lifing, Natural
Language Processing, Part Failure, Reliability Centered Maintenance
Introduction
Developing machine learning models from sensor data, e.g. aircraft flight data, can be challenging.
In cases where the activity of interest is unknown to the analyst from the perspective of the sensor
readings, typically event data is curated to identify sensor data that is relevant to the problem that
is being modeled. Curating this data can also be a challenge and the time it takes to hand curate
can drive up the cost of model production. Reading through documentation of events to find
relevant records requires time and subject matter expertise. To overcome this obstacle, we propose
developing an enhanced search mechanism through the implementation of and unsupervised
toolset of Natural Language Processing (NLP).