基于动态运动基元的强化学习

ID:38529

大小:1.08 MB

页数:13页

时间:2023-03-11

金币:2

上传者:战必胜
applied
sciences
Article
Reinforcement Learning with Dynamic Movement Primitives
for Obstacle Avoidance
Ang Li
1,2
, Zhenze Liu
3
, Wenrui Wang
1,2,
* , Mingchao Zhu
1,
*, Yanhui Li
1
, Qi Huo
1
and Ming Dai
1

 
Citation: Li, A.; Liu, Z.; Wang, W.;
Zhu, M.; Li, Y.; Huo, Q.; Dai, M.
Reinforcement Learning with
Dynamic Movement Primitives for
Obstacle Avoidance. Appl. Sci. 2021,
11, 11184. https://doi.org/10.3390/
app112311184
Academic Editor: Dario Richiedei
Received: 20 October 2021
Accepted: 22 November 2021
Published: 25 November 2021
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences,
Changchun 130033, China
; liang@ciomp.ac.cn (A.L.); liyanhui@ciomp.ac.cn (Y.L.); huoqi@ciomp.ac.cn (Q.H.);
daim@ciomp.ac.cn (M.D.)
2
University of Chinese Academy of Sciences, Beijing 100049, China
3
College of Communication Engineering, Jilin University, Changchun 130025, China; zzliu@jlu.edu.cn
* Correspondence: wangwenrui16@mails.ucas.ac.cn (W.W.); zhumingchao@ciomp.ac.cn (M.Z.)
Abstract:
Dynamic movement primitives (DMPs) are a robust framework for movement generation
from demonstrations. This framework can be extended by adding a perturbing term to achieve
obstacle avoidance without sacrificing stability. The additional term is usually constructed based
on potential functions. Although different potentials are adopted to improve the performance of
obstacle avoidance, the profiles of potentials are rarely incorporated into reinforcement learning
(RL) framework. In this contribution, we present a RL based method to learn not only the profiles
of potentials but also the shape parameters of a motion. The algorithm employed is PI2 (Policy
Improvement with Path Integrals), a model-free, sampling-based learning method. By using the PI2,
the profiles of potentials and the parameters of the DMPs are learned simultaneously; therefore, we
can optimize obstacle avoidance while completing specified tasks. We validate the presented method
in simulations and with a redundant robot arm in experiments.
Keywords:
obstacle avoidance; Dynamic Movement Primitives; reinforcement learning; PI2 (policy
improvement with path integrals)
1. Introduction
As robots are applied to more and more complex scenarios, people set a higher
request to adaptability and reliability at the motion planning level. To deal with dynamic
environments, there are at least two different strategies to avoid collision for robots. One
is global strategy [
1
,
2
], it is usually based on search processes and often computationally
expensive and time-consuming [
3
], such that continuous fast trajectory modification based
on sensory feedback are hard to accomplish. The other is local strategy, it is always fast to
compute, but the computed trajectories are suboptimal. To this end, Dynamic Movement
Primitives (DMPs) [4] are introduced as a versatile framework to solve this problem.
1.1. Related Work
In DMPs framework, the additional perturbing term is modified online based on
feedback from the environment to achieve obstacle avoidance [
5
8
]. The perturbing term
is usually designed as different artificial potential fields to get better performance [
5
].
The classical one is static potential, and the effect of the static field is only related to the
distance between the end-effector and obstacles [
9
]. The dynamic potential field is most
combined with the DMP method [
5
]. By using the dynamic potential field, robots can
accomplish smoother avoidance movements because the potential depends on both the
distance and the relative velocity between end-effector and obstacles [
6
,
7
]. The closed-
form of harmonic potential function was presented to avoid the convex and concave
obstacles [
10
]. It inherited the convergence of the harmonic potential while not resulting
in the creation of a potentially infinite number of pseudo attractors on the obstacle [
11
].
Appl. Sci. 2021, 11, 11184. https://doi.org/10.3390/app112311184 https://www.mdpi.com/journal/applsci
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭