基于动态运动基元的强化学习

ID：38529

阅读量：0

大小：1.08 MB

页数：13页

时间：2023-03-11

金币：2

上传者：战必胜

applied

sciences

Article

Reinforcement Learning with Dynamic Movement Primitives

for Obstacle Avoidance

Ang Li

1,2

, Zhenze Liu

, Wenrui Wang

1,2,

* , Mingchao Zhu

*, Yanhui Li

, Qi Huo

and Ming Dai



 

Citation: Li, A.; Liu, Z.; Wang, W.;

Zhu, M.; Li, Y.; Huo, Q.; Dai, M.

Reinforcement Learning with

Dynamic Movement Primitives for

Obstacle Avoidance. Appl. Sci. 2021,

11, 11184. https://doi.org/10.3390/

app112311184

Academic Editor: Dario Richiedei

Received: 20 October 2021

Accepted: 22 November 2021

Published: 25 November 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

Changchun Institute of Optics, Fine Mechanics and Physics, Chinese Academy of Sciences,

Changchun 130033, China

; liang@ciomp.ac.cn (A.L.); liyanhui@ciomp.ac.cn (Y.L.); huoqi@ciomp.ac.cn (Q.H.);

daim@ciomp.ac.cn (M.D.)

University of Chinese Academy of Sciences, Beijing 100049, China

College of Communication Engineering, Jilin University, Changchun 130025, China; zzliu@jlu.edu.cn

* Correspondence: wangwenrui16@mails.ucas.ac.cn (W.W.); zhumingchao@ciomp.ac.cn (M.Z.)

Abstract:

Dynamic movement primitives (DMPs) are a robust framework for movement generation

from demonstrations. This framework can be extended by adding a perturbing term to achieve

obstacle avoidance without sacriﬁcing stability. The additional term is usually constructed based

on potential functions. Although different potentials are adopted to improve the performance of

obstacle avoidance, the proﬁles of potentials are rarely incorporated into reinforcement learning

(RL) framework. In this contribution, we present a RL based method to learn not only the proﬁles

of potentials but also the shape parameters of a motion. The algorithm employed is PI2 (Policy

Improvement with Path Integrals), a model-free, sampling-based learning method. By using the PI2,

the proﬁles of potentials and the parameters of the DMPs are learned simultaneously; therefore, we

can optimize obstacle avoidance while completing speciﬁed tasks. We validate the presented method

in simulations and with a redundant robot arm in experiments.

Keywords:

obstacle avoidance; Dynamic Movement Primitives; reinforcement learning; PI2 (policy

improvement with path integrals)

1. Introduction

As robots are applied to more and more complex scenarios, people set a higher

request to adaptability and reliability at the motion planning level. To deal with dynamic

environments, there are at least two different strategies to avoid collision for robots. One

is global strategy [

], it is usually based on search processes and often computationally

expensive and time-consuming [

], such that continuous fast trajectory modiﬁcation based

on sensory feedback are hard to accomplish. The other is local strategy, it is always fast to

compute, but the computed trajectories are suboptimal. To this end, Dynamic Movement

Primitives (DMPs) [4] are introduced as a versatile framework to solve this problem.

1.1. Related Work

In DMPs framework, the additional perturbing term is modiﬁed online based on

feedback from the environment to achieve obstacle avoidance [

–

]. The perturbing term

is usually designed as different artiﬁcial potential ﬁelds to get better performance [

The classical one is static potential, and the effect of the static ﬁeld is only related to the

distance between the end-effector and obstacles [

]. The dynamic potential ﬁeld is most

combined with the DMP method [

]. By using the dynamic potential ﬁeld, robots can

accomplish smoother avoidance movements because the potential depends on both the

distance and the relative velocity between end-effector and obstacles [

]. The closed-

form of harmonic potential function was presented to avoid the convex and concave

obstacles [

]. It inherited the convergence of the harmonic potential while not resulting

in the creation of a potentially inﬁnite number of pseudo attractors on the obstacle [

Appl. Sci. 2021, 11, 11184. https://doi.org/10.3390/app112311184 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 13



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

基于动态运动基元的强化学习

最近更新

大家都在看

相关文章

相关标签