Citation: Chang, J.; Yu, D.; Hu, Y.; He,
W.; Yu, H. Deep Reinforcement
Learning for Dynamic Flexible Job
Shop Scheduling with Random Job
Arrival. Processes 2022, 10, 760.
https://doi.org/10.3390/pr10040760
Academic Editors: Kelvin K.L. Wong,
Dhanjoo N. Ghista, Andrew W.H. Ip
and Wenjun (Chris) Zhang
Received: 16 March 2022
Accepted: 11 April 2022
Published: 13 April 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Deep Reinforcement Learning for Dynamic Flexible Job Shop
Scheduling with Random Job Arrival
Jingru Chang
1,2,3
, Dong Yu
2,
*, Yi Hu
2,4
, Wuwei He
1,2
and Haoyu Yu
1,2
1
University of Chinese Academy of Sciences, Beijing 100049, China; changjingru@neusoft.edu.cn (J.C.);
wuhewei2021@163.com (W.H.); yuhaoyu2021@sina.com (H.Y.)
2
Shenyang Institute of Computing Technology, Chinese Academy of Sciences, Shenyang 110168, China;
huyi@sict.ac.cn
3
Department of Software Engineering, Dalian Neusoft University of Information, Dalian 116023, China
4
Shenyang Zhongke CNC Technology Co., Ltd., Shenyang 110168, China
* Correspondence: yudong@sict.ac.cn
Abstract:
The production process of a smart factory is complex and dynamic. As the core of manu-
facturing management, the research into the flexible job shop scheduling problem (FJSP) focuses on
optimizing scheduling decisions in real time, according to the changes in the production environment.
In this paper, deep reinforcement learning (DRL) is proposed to solve the dynamic FJSP (DFJSP) with
random job arrival, with the goal of minimizing penalties for earliness and tardiness. A double deep
Q-networks (DDQN) architecture is proposed and state features, actions and rewards are designed.
A soft
ε
-greedy behavior policy is designed according to the scale of the problem. The experimental
results show that the proposed DRL is better than other reinforcement learning (RL) algorithms,
heuristics and metaheuristics in terms of solution quality and generalization. In addition, the soft
ε
-greedy strategy reasonably balances exploration and exploitation, thereby improving the learning
efficiency of the scheduling agent. The DRL method is adaptive to the dynamic changes of the
production environment in a flexible job shop, which contributes to the establishment of a flexible
scheduling system with self-learning, real-time optimization and intelligent decision-making.
Keywords:
smart factory; flexible job shop scheduling problem; deep reinforcement learning; random
job arrival; penalties for earliness and tardiness; double deep Q-networks
1. Introduction
Industry 4.0, also called the “smart factory” [
1
], focuses on the integration of advanced
technologies such as the Internet of Things, big data and artificial intelligence with en-
terprise resource planning, manufacturing execution management and process control
management. Thus, a smart factory has the capabilities of autonomous perception, analysis,
reasoning, decision-making and control. The flexible job shop scheduling problem (FJSP)
is an extension of the traditional job shop scheduling problem (JSP). The FJSP provides
possibilities and guarantees low variation in diversified and differentiated manufacturing,
which is widely used in the semiconductor manufacturing process, the automobile assem-
bly process, mechanical manufacturing systems, etc. [
2
]. As the core of manufacturing
execution management and process control management, the real-time optimization and
control of FJSP provides increased flexibility in the management of a smart factory, aiming
to improve factory productivity and the efficient utilization of resources in real time [3].
The FJSP breaks through the uniqueness restriction of production resources. Each
operation can be assigned on one or more available machines and the processing time is
different for different machines [
4
]. The FJSP reduces the machine constraints and expands
the size of the feasible solution search space, so it is a strong NP-hard problem that is more
complex than the JSP [
5
,
6
]. So far, a large number of studies on the FJSP have assumed
that the scheduling takes place in a static production environment, where the shop floor
Processes 2022, 10, 760. https://doi.org/10.3390/pr10040760 https://www.mdpi.com/journal/processes