Article
Reinforcement Learning with Dynamic Movement Primitives
for Obstacle Avoidance
Ang Li
1,2
, Zhenze Liu
3
, Wenrui Wang
1,2,
* , Mingchao Zhu
1,
*, Yanhui Li
1
, Qi Huo
1
and Ming Dai
1
Citation: Li, A.; Liu, Z.; Wang, W.;
Zhu, M.; Li, Y.; Huo, Q.; Dai, M.
Reinforcement Learning with
Dynamic Movement Primitives for
Obstacle Avoidance. Appl. Sci. 2021,
11, 11184. https://doi.org/10.3390/
app112311184
Academic Editor: Dario Richiedei
Received: 20 October 2021
Accepted: 22 November 2021
Published: 25 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences,
Changchun 130033, China
; liang@ciomp.ac.cn (A.L.); liyanhui@ciomp.ac.cn (Y.L.); huoqi@ciomp.ac.cn (Q.H.);
daim@ciomp.ac.cn (M.D.)
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
College of Communication Engineering, Jilin University, Changchun 130025, China; zzliu@jlu.edu.cn
* Correspondence: wangwenrui16@mails.ucas.ac.cn (W.W.); zhumingchao@ciomp.ac.cn (M.Z.)
Abstract:
Dynamic movement primitives (DMPs) are a robust framework for movement generation
from demonstrations. This framework can be extended by adding a perturbing term to achieve
obstacle avoidance without sacrificing stability. The additional term is usually constructed based
on potential functions. Although different potentials are adopted to improve the performance of
obstacle avoidance, the profiles of potentials are rarely incorporated into reinforcement learning
(RL) framework. In this contribution, we present a RL based method to learn not only the profiles
of potentials but also the shape parameters of a motion. The algorithm employed is PI2 (Policy
Improvement with Path Integrals), a model-free, sampling-based learning method. By using the PI2,
the profiles of potentials and the parameters of the DMPs are learned simultaneously; therefore, we
can optimize obstacle avoidance while completing specified tasks. We validate the presented method
in simulations and with a redundant robot arm in experiments.
Keywords:
obstacle avoidance; Dynamic Movement Primitives; reinforcement learning; PI2 (policy
improvement with path integrals)
1. Introduction
As robots are applied to more and more complex scenarios, people set a higher
request to adaptability and reliability at the motion planning level. To deal with dynamic
environments, there are at least two different strategies to avoid collision for robots. One
is global strategy [
1
,
2
], it is usually based on search processes and often computationally
expensive and time-consuming [
3
], such that continuous fast trajectory modification based
on sensory feedback are hard to accomplish. The other is local strategy, it is always fast to
compute, but the computed trajectories are suboptimal. To this end, Dynamic Movement
Primitives (DMPs) [4] are introduced as a versatile framework to solve this problem.
1.1. Related Work
In DMPs framework, the additional perturbing term is modified online based on
feedback from the environment to achieve obstacle avoidance [
5
–
8
]. The perturbing term
is usually designed as different artificial potential fields to get better performance [
5
].
The classical one is static potential, and the effect of the static field is only related to the
distance between the end-effector and obstacles [
9
]. The dynamic potential field is most
combined with the DMP method [
5
]. By using the dynamic potential field, robots can
accomplish smoother avoidance movements because the potential depends on both the
distance and the relative velocity between end-effector and obstacles [
6
,
7
]. The closed-
form of harmonic potential function was presented to avoid the convex and concave
obstacles [
10
]. It inherited the convergence of the harmonic potential while not resulting
in the creation of a potentially infinite number of pseudo attractors on the obstacle [
11
].
Appl. Sci. 2021, 11, 11184. https://doi.org/10.3390/app112311184 https://www.mdpi.com/journal/applsci