Citation: Kim, K. Multi-Agent Deep
Q Network to Enhance the
Reinforcement Learning for Delayed
Reward System. Appl. Sci. 2022, 12,
3520. https://doi.org/10.3390/
app12073520
Academic Editors: Yangquan Chen,
Subhas Mukhopadhyay,
Nunzio Cennamo, M. Jamal Deen,
Junseop Lee and Simone Morais
Received: 7 February 2022
Accepted: 16 March 2022
Published: 30 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the author.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Multi-Agent Deep Q Network to Enhance the Reinforcement
Learning for Delayed Reward System
Keecheon Kim
Department of Computer Science and Engineering, Konkuk University, Seoul 05029, Korea; kckim@konkuk.ac.kr;
Tel.: +82-2-450-3518
Abstract:
This study examines various factors and conditions that are related with the performance of
reinforcement learning, and defines a multi-agent DQN system (N-DQN) model to improve them. N-
DQN model is implemented in this paper with examples of maze finding and ping-pong as examples
of delayed reward system, where delayed reward occurs, which makes general DQN learning difficult
to apply. The implemented N-DQN shows about 3.5 times higher learning performance compared
to the Q-Learning algorithm in the reward-sparse environment in the performance evaluation, and
compared to DQN, it shows about 1.1 times faster goal achievement speed. In addition, through the
implementation of the prioritized experience replay and the implementation of the reward acquisition
section segmentation policy, such a problem as positive-bias of the existing reinforcement learning
models seldom or never occurred. However, according to the characteristics of the architecture that
uses many numbers of actors in parallel, the need for additional research on light-weighting the
system for further performance improvement has raised. This paper describes in detail the structure
of the proposed multi-agent N_DQN architecture, the contents of various algorithms used, and the
specification for its implementation.
Keywords:
reinforcement learning; Q learning; DQN (Deep Q Networks); HDQN (Hierarchical
DQN); NDQN (Multi-Agent DQN); delayed reward system; maze game; multi-agent reinforcement
learning; prioritized experience replay
1. Introduction
Reinforcement learning is an algorithm that makes autonomous decision-making in
the direction of maximizing the value expected in the future by defining the environment
to be applied as the state, behavior, and expected reward. Unlike other techniques such as
supervised learning, since reinforcement learning recognizes the state of data and makes
decisions, it is the more effective in solving the problem of choosing the optimal policy at
every moment. In addition, it has the advantage of being able to proceed with learning by
judging the environment and situation by oneself without the need for prior knowledge on
complex problems [1].
However, this does not mean that reinforcement learning with these strengths can
be applied to all fields and problems. A variety of unpredictable and abrupt variables
may occur in certain fields and systems, and they use high-dimensional data with an
unpredictable number of cases. In a specific environment like this, a clear limitation of
reinforcement learning is revealed [2].
As an example, the most commonly used Q-Learning algorithm in reinforcement
learning has a decision-making structure that evaluates the Q-value of an action and selects
the action with the highest value when performing the learning in the environment based
on the Markov Decision Process (MDP) theory [
3
]. It is a learning paradigm with learning
by rewards/penalties with some interesting applications, so as to maximize numerical
performance measures that express a long-term objective. This decision-making structure
is most similar to the basic philosophy of reinforcement learning, but when this structure is
Appl. Sci. 2022, 12, 3520. https://doi.org/10.3390/app12073520 https://www.mdpi.com/journal/applsci