Article
Time Series Segmentation Based on Stationarity Analysis
to Improve New Samples Prediction
Ricardo Petri Silva
1,
* , Bruno Bogaz Zarpelão
2
, Alberto Cano
3
and Sylvio Barbon Junior
2
Citation: Silva, R.P.; Zarpelão, B.B.;
Cano, A.; Junior, S.B. Time Series
Segmentation Based on Stationarity
Analysis to Improve New Samples
Prediction. Sensors 2021, 21, 7333.
https://doi.org/10.3390/s21217333
Academic Editors: YangQuan Chen,
Subhas Mukhopadhyay, Nunzio
Cennamo, M. Jamal Deen, Junseop
Lee and Simone Morais
Received: 9 August 2021
Accepted: 2 November 2021
Published: 4 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Electrical Engineering, State University of Londrina, Londrina 86057-970, Brazil
2
Department of Computer Science, State University of Londrina, Londrina 86057-970, Brazil;
brunozarpelao@uel.br (B.B.Z.); barbon@uel.br (S.B.J.)
3
Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA;
acano@vcu.edu
* Correspondence: petri@uel.br
Abstract:
A wide range of applications based on sequential data, named time series, have become
increasingly popular in recent years, mainly those based on the Internet of Things (IoT). Several
different machine learning algorithms exploit the patterns extracted from sequential data to support
multiple tasks. However, this data can suffer from unreliable readings that can lead to low accuracy
models due to the low-quality training sets available. Detecting the change point between high
representative segments is an important ally to find and thread biased subsequences. By constructing
a framework based on the Augmented Dickey-Fuller (ADF) test for data stationarity, two proposals
to automatically segment subsequences in a time series were developed. The former proposal, called
Change Detector segmentation, relies on change detection methods of data stream mining. The latter,
called ADF-based segmentation, is constructed on a new change detector derived from the ADF test
only. Experiments over real-file IoT databases and benchmarks showed the improvement provided
by our proposals for prediction tasks with traditional Autoregressive integrated moving average
(ARIMA) and Deep Learning (Long short-term memory and Temporal Convolutional Networks)
methods. Results obtained by the Long short-term memory predictive model reduced the relative
prediction error from 1 to 0.67, compared to time series without segmentation.
Keywords:
time series segmentation; stationarity analysis; time series prediction improvement; size
reduction in time series
1. Introduction
The growth of data generation increases daily due to the advancement of
technology [1]
.
With the advent of sensors that are capable of capturing precious data, there is also the
need to transform this data into information. The most common data structure in the era of
automatic sensor data processing is time series. A time series can be defined as a set of se-
quential data ordered in time [
2
]. Traditionally, stochastic processes are used to model time
series behavior with great success [
3
,
4
]. In addition, machine learning-based approaches
are also employed to perform the identification of complex behaviors of nonlinear patterns,
optimization of unconventional functions, and even establishing connections with long
dependencies through recurrent neural networks [
5
,
6
]. These patterns can be verified in
different areas, such as climatic data [
7
], sales [
8
], medical diagnosis [
9
–
11
], security [
1
,
12
],
and even the change in share values on the stock exchange [13].
From time series analyses, it is possible to examine these patterns and create predictions
of future samples, as discussed in Mahalakshmi et al. [
14
]. Models based on machine learning,
e.g., Long short-term memory (LSTM) and Temporal Convolutional Network (TCN), have
shown promising results, [
15
,
16
], as an alternative to statistical models. Approaches that apply
machine learning concepts can adapt their settings to improve predictive ability [
17
]. This can
be done by adjusting their hyperparameters so that the time series modeling is better suited
Sensors 2021, 21, 7333. https://doi.org/10.3390/s21217333 https://www.mdpi.com/journal/sensors