
Citation: Zhu, W.; Rosendo, A. PSTO:
Learning Energy-Efficient
Locomotion for Quadruped Robots.
Machines 2022, 10, 185. https://
doi.org/10.3390/machines10030185
Academic Editors: Dan Zhang and
Marco Ceccarelli
Received: 6 January 2022
Accepted: 18 February 2022
Published: 4 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
PSTO: Learning Energy-Efficient Locomotion for
Quadruped Robots
Wangshu Zhu and Andre Rosendo *
Living Machines Laboratory, School of Information Science and Technology, ShanghaiTech University,
Shanghai 201210, China; zhuwsh@shanghaitech.edu.cn
* Correspondence: arosendo@shanghaitech.edu.cn
Abstract:
Energy efficiency is critical for the locomotion of quadruped robots. However, energy
efficiency values found in simulations do not transfer adequately to the real world. To address
this issue, we present a novel method, named Policy Search Transfer Optimization (PSTO), which
combines deep reinforcement learning and optimization to create energy-efficient locomotion for
quadruped robots in the real world. The deep reinforcement learning and policy search process are
performed by the TD3 algorithm and the policy is transferred to the open-loop control trajectory
further optimized by numerical methods, and conducted on the robot in the real world. In order to
ensure the high uniformity of the simulation results and the behavior of the hardware platform, we
introduce and validate the accurate model in simulation including consistent size and fine-tuning
parameters. We then validate those results with real-world experiments on the quadruped robot
Ant by executing dynamic walking gaits with different leg lengths and numbers of amplifications.
We analyze the results and show that our methods can outperform the control method provided
by the state-of-the-art policy search algorithm TD3 and sinusoid function on both energy efficiency
and speed.
Keywords: machine learning; robot locomotion; energy efficiency; deep reinforcement learning
1. Introduction
Legged locomotion [
1
] is essential for robots to traverse difficult environments with
agility and grace. However, the energy efficiency of mobile robots still have room for
improvement when performing a dynamic locomotion. Classical approaches often require
extensive experience of the structure and massive manual tuning of parameteric choices [
2
,
3
].
Recently, learning-based approaches, especially deep reinforcement learning methods,
have achieved tremendous progress in controlling robots [
4
–
7
]. Policy search [
8
], as a
subfield of deep reinforcement learning, is widely studied in recent years. Numbers of
policy search algorithms have appeared to improve the performance, sample efficiency
while reducing the entropy in the learning process e.g., DDPG [
4
], TRPO [
5
], PPO [
9
],
SAC [
10
], and TD3 [
11
]. These algorithms automate the training process and produce
feasible locomotion for robots without much human interference.
While these methods have demonstrated promising results in simulation, the transfer
of those to the real world often performs poorly, including substandard energy efficiency
and low speed, which is mainly caused by the reality gap [
12
]. Model discrepancies between
the simulated and the real physical system, unmodeled dynamics, wrong simulation
parameters, and numerical errors contribute to this gap. In three-dimensional locomotion,
this gap will be even amplified because the subtle difference of the contact situations
between the simulation and the real world could be magnified and forked to unexpected
consequences. With this gap, robots could conduct poor performance, increase energy
consumption, and even damage themselves. Works on narrowing the reality gap are
essential for machine learning on robots.
Machines 2022, 10, 185. https://doi.org/10.3390/machines10030185 https://www.mdpi.com/journal/machines