Citation: Alafif, T.; Hadi, A.;
Allahyani, M.; Alzahrani, B.;
Alhothali, A.; Alotaibi, R.; Barnawi,
A. Hybrid Classifiers for
Spatio-Temporal Abnormal Behavior
Detection, Tracking, and Recognition
in Massive Hajj Crowds. Electronics
2023, 12, 1165. https://doi.org/
10.3390/electronics12051165
Academic Editor: Silvia Liberata Ullo
Received: 28 January 2023
Revised: 22 February 2023
Accepted: 25 February 2023
Published: 28 February 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Hybrid Classifiers for Spatio-Temporal Abnormal Behavior
Detection, Tracking, and Recognition in Massive Hajj Crowds
Tarik Alafif
1
, Anas Hadi
2
, Manal Allahyani
2
, Bander Alzahrani
2
, Areej Alhothali
2,
* , Reem Alotaibi
2
and Ahmed Barnawi
2
1
Department of Computer Science, Jamoum University College, Umm Al-Qura University,
Makkah 25375, Saudi Arabia
2
Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia
* Correspondence: aalhothali@kau.edu.sa
Abstract:
Individual abnormal behaviors vary depending on crowd sizes, contexts, and scenes.
Challenges such as partial occlusions, blurring, a large number of abnormal behaviors, and camera
viewing occur in large-scale crowds when detecting, tracking, and recognizing individuals with
abnormalities. In this paper, our contribution is two-fold. First, we introduce an annotated and
labeled large-scale crowd abnormal behavior Hajj dataset, HAJJv2. Second, we propose two methods
of hybrid convolutional neural networks (CNNs) and random forests (RFs) to detect and recognize
spatio-temporal abnormal behaviors in small and large-scale crowd videos. In small-scale crowd
videos, a ResNet-50 pre-trained CNN model is fine-tuned to verify whether every frame is normal or
abnormal in the spatial domain. If anomalous behaviors are observed, a motion-based individual
detection method based on the magnitudes and orientations of Horn–Schunck optical flow is proposed
to locate and track individuals with abnormal behaviors. A Kalman filter is employed in large-scale
crowd videos to predict and track the detected individuals in the subsequent frames. Then, means
and variances as statistical features are computed and fed to the RF classifier to classify individuals
with abnormal behaviors in the temporal domain. In large-scale crowds, we fine-tune the ResNet-50
model using a YOLOv2 object detection technique to detect individuals with abnormal behaviors in
the spatial domain. The proposed method achieves 99.76% and 93.71% of average area under the
curves (AUCs) on two public benchmark small-scale crowd datasets, UMN and UCSD, respectively,
while the large-scale crowd method achieves 76.08% average AUC using the HAJJv2 dataset. Our
method outperforms state-of-the-art methods using the small-scale crowd datasets with a margin of
1.66%, 6.06%, and 2.85% on UMN, UCSD Ped1, and UCSD Ped2, respectively. It also produces an
acceptable result in large-scale crowds.
Keywords:
abnormal behaviors; small-scale crowd; large-scale crowd; convolutional neural network;
random forest; detection; tracking; recognition
1. Introduction
Abnormal behavior detection in videos has been receiving lots of attention. This
research area has been widely examined in the past two decades due to its importance
and challenging nature in the computer vision domain. Generally, abnormal behavior
is described as the unusual act of an individual in an event such as running, walking in
the opposite direction, jumping, etc. Individual abnormal behaviors can be perceived
differently in different contexts and scenes. Therefore, the definition of abnormal behaviors
may vary from one place or scenario to another. Similarly, the density and the number of
individuals in the crowd often vary significantly, which can result in small or large crowds
according to the context of the scene. A small-scale crowd often contains approximately
tens of individuals gathering or moving in the same location, while a large-scale crowd
contains hundreds or thousands of individuals in the same place. Therefore, the large-
scale crowd scene may raise many challenges as a result of many individuals moving to
Electronics 2023, 12, 1165. https://doi.org/10.3390/electronics12051165 https://www.mdpi.com/journal/electronics