Citation: Wang, Z.; Li, F.; Men, Y.; Fu,
T.; Yang, X.; Song, R. Deep
Deterministic Policy Gradient with
Reward Function Based on Fuzzy
Logic for Robotic Peg-in-Hole
Assembly Tasks. Appl. Sci. 2022, 12,
3181. https://doi.org/10.3390/
app12063181
Academic Editors: Giovanni
Boschetti and João Miguel
da Costa Sousa
Received: 10 February 2022
Accepted: 18 March 2022
Published: 21 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Deep Deterministic Policy Gradient with Reward Function
Based on Fuzzy Logic for Robotic Peg-in-Hole Assembly Tasks
Ziyue Wang
1
, Fengming Li
2,3,
*
,†
, Yu Men
3
, Tianyu Fu
3
, Xuting Yang
3
and Rui Song
3
1
College of Science, Guilin University of Technology, Guilin 541006, China; wziyins27@163.com
2
School of Information and Electrical Engineering, Shandong Jianzhu University, Jinan 250101, China
3
School of Control Science and Engineering, Shandong University, Jinan 250061, China;
202114785@mail.sdu.edu.cn (Y.M.); futy0807@gmail.com (T.F.); 201914426@mail.sdu.edu.cn (X.Y.);
rsong@sdu.edu.cn (R.S.)
* Correspondence: lifengming@sucro.org or lifengming21@sdjzu.edu.cn; Tel.: +86-186-6016-8885
† Current address: School of Control Science and Engineering, Shandong University, Jingshi Road 17923,
Jinan 250061, China.
Abstract: Robot automatic assembly of weak stiffness parts is difficult due to potential deformation
during assembly. The robot manipulation cannot adapt to the dynamic contact changes during the
assembly process. A robot assembly skill learning system is designed by combining the compliance
control and deep reinforcement, which could acquire a better robot assembly strategy. In this paper, a
robot assembly strategy learning method based on variable impedance control is proposed to solve
the robot assembly contact tasks. During the assembly process, the quality evaluation is designed
based on fuzzy logic, and the impedance parameters in the assembly process are studied with a deep
deterministic policy gradient. Finally, the effectiveness of the method is verified using the KUKA iiwa
robot in the weak stiffness peg-in-hole assembly. Experimental results show that the robot obtains
the robot assembly strategy with variable compliant in the process of weak stiffness peg-in-hole
assembly. Compared with the previous methods, the assembly success rate of the proposed method
reaches 100%.
Keywords: robot assembly; deep reinforcement learning; fuzzy reward; compliant control
1. Introduction
The robot operating contact environment is changeable and unpredictable. It is a
challenge that the robot could quickly perform new tasks and precisely control the contact
force in different environments. High-precision assembly is a typical contact operation [
1
,
2
],
and the assembly process needs to overcome the environmental model and controller errors.
The peg-in-hole assembly process is usually divided into the search phase and the insertion
phase [
3
], which is visual and tactile. In the insertion phase, the center axis of the peg-in-
hole inserts into the bottom. When the axis deviation or force/torque is not appropriate,
it can cause card resistance or wedge tightening. Due to the deformation error, friction
and robot positioning error between assembly objects, it is difficult to establish an accurate
physical model and find the optimal assembly strategy according to the model analysis.
Robot assembly control strategies could be designed with forces and torques in the
robot assembly based on mathematical models. Compared to the position feedback con-
troller with high gain, impedance control ensures that the robot and environment are
fully controllable. A natural mass-damping-spring relationship is maintained between the
contact force and the position offset, and its force control characteristics depend on inertia,
stiffness, and damping parameters [
4
]. The traditional method of adjusting parameters
manually adjusts the control parameters according to the characteristics of the task. For the
assembly of such complex tasks, it is difficult to set the impedance control method of fixed
parameters to achieve the target task. If the parameters of impedance control could be
Appl. Sci. 2022, 12, 3181. https://doi.org/10.3390/app12063181 https://www.mdpi.com/journal/applsci