Citation: Zhang, D.; Xuan, Z.; Zhang,
Y.; Yao, J.; Li, X.; Li, X. Path Planning
of Unmanned Aerial Vehicle in
Complex Environments Based on
State-Detection Twin Delayed Deep
Deterministic Policy Gradient.
Machines 2023, 11, 108. https://
doi.org/10.3390/machines11010108
Academic Editors: Wojciech
Giernacki, Andrzej Łukaszewicz,
Zbigniew Kulesza, Jaroslaw Pytka
and Andriy Holovatyy
Received: 12 December 2022
Revised: 6 January 2023
Accepted: 10 January 2023
Published: 13 January 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Path Planning of Unmanned Aerial Vehicle in Complex
Environments Based on State-Detection Twin Delayed Deep
Deterministic Policy Gradient
Danyang Zhang, Zhaolong Xuan *, Yang Zhang, Jiangyi Yao, Xi Li and Xiongwei Li
Equipment Simulation Training Center, Shijiazhuang Campus, Army Engineering University,
Shijiazhuang 050003, China
* Correspondence: youngzhxm@aeu.edu.cn
Abstract:
This paper investigates the path planning problem of an unmanned aerial vehicle (UAV)
for completing a raid mission through ultra-low altitude flight in complex environments. The UAV
needs to avoid radar detection areas, low-altitude static obstacles, and low-altitude dynamic obstacles
during the flight process. Due to the uncertainty of low-altitude dynamic obstacle movement, this
can slow down the convergence of existing algorithm models and also reduce the mission success
rate of UAVs. In order to solve this problem, this paper designs a state detection method to encode
the environmental state of the UAV’s direction of travel and compress the environmental state space.
In considering the continuity of the state space and action space, the SD-TD3 algorithm is proposed
in combination with the double-delayed deep deterministic policy gradient algorithm (TD3), which
can accelerate the training convergence speed and improve the obstacle avoidance capability of the
algorithm model. Further, to address the sparse reward problem of traditional reinforcement learning,
a heuristic dynamic reward function is designed to give real-time rewards and guide the UAV to
complete the task. The simulation results show that the training results of the SD-TD3 algorithm
converge faster than the TD3 algorithm, and the actual results of the converged model are better.
Keywords:
unmanned aerial vehicle; deep reinforcement learning; TD3; dynamic reward function;
state detection
1. Introduction
In recent years, UAVs have been widely used in the military by virtue of their stealth
and high maneuverability. The small and medium-sized UAVs are widely used on the
battlefield to attack important enemy targets because of their small size and the ability
to evade radar detection by flying at low or ultra-low altitudes [
1
–
4
]. In addition, under
the original technology, UAVs were controlled by rear operators for all operations and
did not achieve unmanned operation in the true sense. Further, with the advancement of
artificial intelligence technology, UAV intelligent pilot technology has also been rapidly
developed, and UAV autonomous control can be realized in many functions. However, in
order to further enhance the UAV’s autonomous control capability, research on UAV path
planning, real-time communication, and information processing needs to be strengthened.
Among them, UAV autonomous path planning is a hot issue attracting current researchers’
attention [5–8].
The path planning problem can be described as finding an optimal path from the
current point to the target point under certain constraints, and many algorithms have been
used so far to solve UAV path planning problems in complex unknown environments.
Nowadays, the common path planning algorithms are the A*algorithm, the artificial poten-
tial field algorithm, the genetic algorithm, and the reinforcement learning
method [9,10]
.
In recent years, deep learning (DL) and reinforcement learning (RL) have achieved a lot of
results in many fields. Deep learning (DL) has strong data fitting ability, and reinforcement
Machines 2023, 11, 108. https://doi.org/10.3390/machines11010108 https://www.mdpi.com/journal/machines