Machine learning for internet of things data analysis: a survey
Mohammad Saeid Mahdavinejad
a
,
b
,
*
, Mohammadreza Rezvan
a
,
b
, Mohammadamin Barekatain
c
,
Peyman Adibi
a
, Payam Barnaghi
d
, Amit P. Sheth
b
a
University of Isfahan, Iran
b
Kno.e.sis - Wright State University, USA
c
Technische Universit
€
at München, Germany
d
University of Surrey, UK
ARTICLE INFO
Keywords:
Machine learning
Internet of Things
Smart data
Smart City
ABSTRACT
Rapid developments in hardware, software, and communication technologies have facilitated the emergence of
Internet-connected sensory devices that provide observations and data measurements from the physical world. By
2020, it is estimated that the total number of Internet-connected devices being used will be between 25 and 50
billion. As these numbers grow and technologies become more mature, the volume of data being published will
increase. The technology of Internet-connected devices, referred to as Internet of Things (IoT), continues to extend
the current Internet by providing connectivity and interactions between the physical and cyber worlds. In addition
to an increased volume, the IoT generates big data characterized by its velocity in terms of time and location
dependency, with a variety of multiple modalities and varying data quality. Intelligent processing and analysis of
this big data are the key to developing smart IoT applications. This article assesses the various machine learning
methods that deal with the challenges presented by IoT data by considering smart cities as the main use case. The
key contribution of this study is the presentation of a taxonomy of machine learning algorithms explaining how
different techniques are applied to the data in order to extract higher level information. The potential and
challenges of machine learning for IoT data analytics will also be discussed. A use case of applying a Support
Vector Machine (SVM) to Aarhus smart city traffic data is presented for a more detailed exploration.
1. Introduction
Emerging technologies in recent years and major enhancements to
Internet protocols and computing systems, have made communication
between different devices easier than ever before. According to various
forecasts, around 25–50 billion devices are expected to be connected to
the Internet by 2020. This has given rise to the newly developed concept
of Internet of Things (IoT). IoT is a combination of embedded technol-
ogies including wired and wireless communications, sensor and actuator
devices, and the physical objects connected to the Internet [1,2]. One of
the long-standing objectives of computing is to simplify and enrich
human activities and experiences (e.g., see the visions associated with
“The Computer for the 21st Century” [3] or “Computing for Human
Experience” [4]). IoT requires data to either represent better services to
users or enhance the IoT framework performance to accomplish this
intelligently. In this manner, systems should be able to access raw data
from different resources over the network and analyze this information in
order to extract knowledge.
Since IoT will be among the most significant sources of new data, data
science will provide a considerable contribution to making IoT applica-
tions more intelligent. Data science is the combination of different sci-
entific fields that uses data mining, machine learning, and other
techniques to find patterns and new insights from data. These techniques
include a broad range of algorithms applicable in different domains. The
process of applying data analytics methods to particular areas involves
defining data types such as volume, variety, and velocity; data models
such as neural networks, classification, and clustering methods, and
applying efficient algorithms that match with the data characteristics. By
following our reviews, the following is deduced: First, because data is
generated from different sources with specific data types, it is important
to adopt or develop algorithms that can handle the data characteristics.
Second, the great number of resources that generate data in real-time are
not without the problem of scale and velocity. Finally, finding the best
data model that fits the data is one of the most important issues for
pattern recognition and for better analysis of IoT data. These issues have
opened a vast number of opportunities in expanding new developments.
* Corresponding author.
E-mail addresses: saeid@knoesis.org (M.S. Mahdavinejad), p.barnaghi@surrey.ac.uk (P. Barnaghi).
Contents lists available at ScienceDirect
Digital Communications and Networks
journal homepage: www.keaipublishing.com/en/journals/digital-communications-and-networks/
https://doi.org/10.1016/j.dcan.2017.10.002
Received 26 July 2017; Received in revised form 4 October 2017; Accepted 9 October 2017
Available online 12 October 2017
2352-8648/© 2018 Chongqing University of Posts and Telecommunications. Production and hosting by Elsevier B.V. on behalf of KeAi. This is an open access article
under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Digital Communications and Networks 4 (2018) 161–175