Citation: Liu, S.; Li, G.; Zhan, Y.; Gao,
P. MUSAK: A Multi-Scale Space
Kinematic Method for Drone
Detection. Remote Sens. 2022, 14, 1434.
https://doi.org/10.3390/rs14061434
Academic Editor: Dusan Gleich
Received: 16 February 2022
Accepted: 12 March 2022
Published: 16 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
MUSAK: A Multi-Scale Space Kinematic Method for
Drone Detection
Sunxiangyu Liu
1,2
, Guitao Li
2
, Yafeng Zhan
1
and Peng Gao
3,
*
1
Beijing National Research Center for Information Science and Technology (BNRist), Tsinghua University,
Beijing 100084, China; lsxy14@mails.tsinghua.edu.cn (S.L.); zhanyf@tsinghua.edu.cn (Y.Z.)
2
School of Aerospace Engineering, Tsinghua University, Beijing 100084, China; ligt@tsinghua.edu.cn
3
School of Mechatronical Engineering, Beijing Institute of Technology, Beijing 100081, China
* Correspondence: gaopeng1982@pku.edu.cn
Abstract:
Accurate and robust drone detection is an important and challenging task. However, on
this issue, previous research, whether based on appearance or motion features, has not yet provided a
satisfactory solution, especially under a complex background. To this end, the present work proposes
a motion-based method termed the Multi-Scale Space Kinematic detection method (MUSAK). It fully
leverages the motion patterns by extracting 3D, pseudo 3D and 2D kinematic parameters at three
scale spaces according to the keypoints quality and builds three Gated Recurrent Unit (GRU)-based
detection branches for drone recognition. The MUSAK method is evaluated on a hybrid dataset
named multiscale UAV dataset (MUD), consisting of public datasets and self-collected data with
motion labels. The experimental results show that MUSAK improves the performance by a large
margin, a 95% increase in average precision (AP), compared with the previous state-of-the-art (SOTA)
motion-based methods, and the hybrid MUSAK method, which integrates with the appearance-based
method Faster Region-based Convolutional Neural Network (Faster R-CNN), achieves a new SOTA
performance on AP metrics (AP, APM, and APS).
Keywords: drone detection; motion-based; kinematic; multi-scale space
1. Introduction
UAVs (Unmanned Aircraft Vehicles), or drones, are currently being widely utilized
in civilian and military applications, such as surveillance, rescue, surveying and delivery,
for their features of “LSS” (low altitude, slow speed, and small size). However, as a result
of such characteristics, UAVs are also hard to detect, and therefore may cause serious
threat to military and social security, especially for airplanes when landing or taking off.
For example, Frankfurt Airport temporarily closed in March 2019 due to two hovering
drones nearby, and caused approximately 60 flight cancellations [
1
]. Hence, an accurate,
long and large range UAVs detection method is urgently required for now and the future.
Recent approaches for detecting UAVs in images are always based on computer vision
(CV) methods [
2
–
5
], which can be roughly classified into three categories: those based
on appearance, on motion information across frames, and the hybrid. Appearance-based
methods rely on specially designed neural network (NN) frameworks, such as Faster
R-CNN [
6
], You-Only-Look-Once (YOLO)v3 [
7
], Single Shot MultiBox Detector (SSD) [
8
]
and Cascade R-CNN [
5
]. They have been proven to be powerful under complex lighting
or backgrounds for some tasks. However, their limitation is that the targets are required
to be relatively large and clear in vision [
9
,
10
], which is often not the case in real-world
scenes for drone detection. Motion-based methods mainly rely on optical flow [
11
–
16
] or
motion modeling of foreground [
17
–
19
]. These methods are more robust when the target
objects are tiny or blurry in images, but they are more often employed for region proposal
or distinguishing moving objects from static backgrounds, rather than detecting. Hybrid
methods, combining both appearance and motion information, may add extra structures
Remote Sens. 2022, 14, 1434. https://doi.org/10.3390/rs14061434 https://www.mdpi.com/journal/remotesensing