多Agent Deep Q网络增强延迟奖励系统的强化学习-2022年

ID：37226

阅读量：0

大小：5.86 MB

页数：20页

时间：2023-03-03

金币：10

上传者：战必胜



 

Citation: Kim, K. Multi-Agent Deep

Q Network to Enhance the

Reinforcement Learning for Delayed

Reward System. Appl. Sci. 2022, 12,

3520. https://doi.org/10.3390/

app12073520

Academic Editors: Yangquan Chen,

Subhas Mukhopadhyay,

Nunzio Cennamo, M. Jamal Deen,

Junseop Lee and Simone Morais

Received: 7 February 2022

Accepted: 16 March 2022

Published: 30 March 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

Multi-Agent Deep Q Network to Enhance the Reinforcement

Learning for Delayed Reward System

Keecheon Kim

Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea; kckim@konkuk.ac.kr;

Tel.: +82-2-450-3518

Abstract:

This study examines various factors and conditions that are related with the performance of

reinforcement learning, and deﬁnes a multi-agent DQN system (N-DQN) model to improve them. N-

DQN model is implemented in this paper with examples of maze ﬁnding and ping-pong as examples

of delayed reward system, where delayed reward occurs, which makes general DQN learning difﬁcult

to apply. The implemented N-DQN shows about 3.5 times higher learning performance compared

to the Q-Learning algorithm in the reward-sparse environment in the performance evaluation, and

compared to DQN, it shows about 1.1 times faster goal achievement speed. In addition, through the

implementation of the prioritized experience replay and the implementation of the reward acquisition

section segmentation policy, such a problem as positive-bias of the existing reinforcement learning

models seldom or never occurred. However, according to the characteristics of the architecture that

uses many numbers of actors in parallel, the need for additional research on light-weighting the

system for further performance improvement has raised. This paper describes in detail the structure

of the proposed multi-agent N_DQN architecture, the contents of various algorithms used, and the

speciﬁcation for its implementation.

Keywords:

reinforcement learning; Q learning; DQN (Deep Q Networks); HDQN (Hierarchical

DQN); NDQN (Multi-Agent DQN); delayed reward system; maze game; multi-agent reinforcement

learning; prioritized experience replay

1. Introduction

Reinforcement learning is an algorithm that makes autonomous decision-making in

the direction of maximizing the value expected in the future by deﬁning the environment

to be applied as the state, behavior, and expected reward. Unlike other techniques such as

supervised learning, since reinforcement learning recognizes the state of data and makes

decisions, it is the more effective in solving the problem of choosing the optimal policy at

every moment. In addition, it has the advantage of being able to proceed with learning by

judging the environment and situation by oneself without the need for prior knowledge on

complex problems [1].

However, this does not mean that reinforcement learning with these strengths can

be applied to all ﬁelds and problems. A variety of unpredictable and abrupt variables

may occur in certain ﬁelds and systems, and they use high-dimensional data with an

unpredictable number of cases. In a speciﬁc environment like this, a clear limitation of

reinforcement learning is revealed [2].

As an example, the most commonly used Q-Learning algorithm in reinforcement

learning has a decision-making structure that evaluates the Q-value of an action and selects

the action with the highest value when performing the learning in the environment based

on the Markov Decision Process (MDP) theory [

]. It is a learning paradigm with learning

by rewards/penalties with some interesting applications, so as to maximize numerical

performance measures that express a long-term objective. This decision-making structure

is most similar to the basic philosophy of reinforcement learning, but when this structure is

Appl. Sci. 2022, 12, 3520. https://doi.org/10.3390/app12073520 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 20



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

多Agent Deep Q网络增强延迟奖励系统的强化学习-2022年

最近更新

大家都在看

相关文章

相关标签