Article
A Hybrid Deep Learning Model for Recognizing Actions of
Distracted Drivers
Shuang-Jian Jiao, Lin-Yao Liu and Qian Liu *
Citation: Jiao, S.-J.; Liu, L.-Y.; Liu, Q.
A Hybrid Deep Learning Model for
Recognizing Actions of Distracted
Drivers. Sensors 2021, 21, 7424.
https://doi.org/10.3390/
s21217424
Academic Editors: Nunzio Cennamo,
YangQuan Chen, Subhas
Mukhopadhyay, M. Jamal Deen,
Junseop Lee, Simone Morais and
Biswanath Samanta
Received: 9 September 2021
Accepted: 4 November 2021
Published: 8 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Department of Civil Engineering, College of Engineering, Ocean University of China, Qingdao 266100, China;
jsj6039@ouc.edu.cn (S.-J.J.); lly6409@stu.ouc.edu.cn (L.-Y.L.)
* Correspondence: liuqian6428@stu.ouc.edu.cn
Abstract:
With the rapid spreading of in-vehicle information systems such as smartphones, navi-
gation systems, and radios, the number of traffic accidents caused by driver distractions shows an
increasing trend. Timely identification and warning are deemed to be crucial for distracted driving
and the establishment of driver assistance systems is of great value. However, almost all research
on the recognition of the driver’s distracted actions using computer vision methods neglected the
importance of temporal information for action recognition. This paper proposes a hybrid deep
learning model for recognizing the actions of distracted drivers. Specifically, we used OpenPose to
obtain skeleton information of the human body and then constructed the vector angle and modulus
ratio of the human body structure as features to describe the driver’s actions, thereby realizing the
fusion of deep network features and artificial features, which improve the information density of
spatial features. The K-means clustering algorithm was used to preselect the original frames, and the
method of inter-frame comparison was used to obtain the final keyframe sequence by comparing
the Euclidean distance between manually constructed vectors representing frames and the vector
representing the cluster center. Finally, we constructed a two-layer long short-term memory neural
network to obtain more effective spatiotemporal features, and one softmax layer to identify the
distracted driver’s action. The experimental results based on the collected dataset prove the effec-
tiveness of this framework, and it can provide a theoretical basis for the establishment of vehicle
distraction warning systems.
Keywords:
driver distraction; OpenPose; LSTM; keyframe sequences; action recognition; nested
cross-validation
1. Introduction
According to data published by the World Health Organization (WHO), approximately
1.2 million people die in traffic accidents worldwide every year [
1
]. According to the
National Highway Traffic Safety Administration (NHTSA), approximately 20% of traffic
accidents and 80% of almost impending traffic accidents are caused by driver distraction,
which emerges as a key factor in serious and fatal accidents [
2
]. In 2018 alone, driver
distraction claimed the lives of 2841 people in the USA [
3
]. Therefore, investigating the
cause of distracted driving and reducing the number of distraction-affected traffic accidents
remains an imperative issue.
According to related research [
4
], there are two main reasons for driver distraction:
(i) internal reasons: fatigue driving, drunk driving, and drug driving, that is, the mental
states of the driver are not suitable for driving. Methods that focus on detecting driver
distraction due to internal reasons are mainly divided into physiological parameter-based
methods [
5
,
6
] and naturalistic driving data-based methods [
7
,
8
]; (ii) external reasons: the
driver has external interference, such as calling, texting, and talking with passengers, and
other secondary tasks that interfere with the driver driving in the proper mental condition.
Computer vision methods are used to identify driver distraction caused by external reasons,
which have two advantages that can put them into practical application. First, compared
Sensors 2021, 21, 7424. https://doi.org/10.3390/s21217424 https://www.mdpi.com/journal/sensors