Article
A 6DoF Pose Estimation Dataset and Network for Multiple
Parametric Shapes in Stacked Scenarios
Xinyu Zhang
1
, Weijie Lv
2
and Long Zeng
1,
*
Citation: Zhang, X.; Lv, W.; Zeng, L.
A 6DoF Pose Estimation Dataset and
Network for Multiple Parametric
Shapes in Stacked Scenarios.
Machines 2021, 9, 321. https://
doi.org/10.3390/machines9120321
Academic Editors: Xiaochun Cheng
and Daming Shi
Received: 15 October 2021
Accepted: 23 November 2021
Published: 27 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Advanced Manufacturing, Shenzhen International Graduate School, Tsinghua University,
Shenzhen 518055, China; zhangxy20@mails.tsinghua.edu.cn
2
Department of Mechanical Engineering, Tsinghua University, Beijing 100084, China;
lwj19@mails.tsinghua.edu.cn
* Correspondence: zenglong@sz.tsinghua.edu.cn
Abstract:
Most industrial parts are instantiated from different parametric templates. The 6DoF
(6D) pose estimation tasks are challenging, since some part objects from a known template may be
unseen before. This paper releases a new and well-annotated 6D pose estimation dataset for multiple
parametric templates in stacked scenarios donated as Multi-Parametric Dataset, where a training set
(50K scenes) and a test set (2K scenes) are obtained by automatical labeling techniques. In particular,
the test set is further divided into a TEST-L dataset for learning evaluation and a TEST-G dataset for
generalization evaluation. Since the part objects from the same template are regarded as a class in
the Multi-Parametric Dataset and the number of part objects is infinite, we propose a new 6D pose
estimation network as our baseline method, Multi-templates Parametric Pose Network (MPP-Net),
aiming to have sufficient generalization ability for parametric part objects in stacked scenarios. To
our best knowledge, our dataset and method are the first to jointly achieve 6D pose estimation and
parameter values prediction for multiple parametric templates. Many experiments are conducted
on the Multi-Parametric Dataset. The mIoU and Overall Accuracy of foreground segmentation and
template segmentation on the two test datasets exceed 99.0%. Besides, MPP-Net achieves 92.9% and
90.8% on mAP under the threshold of 0.5cm for translation prediction, achieves 41.9% and 36.8%
under the threshold of 5
◦
for rotation prediction, and achieves 51.0% and 6.0% under the threshold of
5% for parameter values prediction, on the two test set, respectively. The results have shown that our
dataset has exploratory value for 6D pose estimation and parameter values prediction tasks.
Keywords: automation; deep learning; pose estimation; robotic grasping
1. Introduction
Parametric techniques have been widely used in the field of industrial design [
1
]. The
assembly of an industrial product usually requires many parametric part objects from
different parametric shapes. A parametric shape is a parametric template described by a
set of driven parameters, which can be instantiated as many parametric part objects [
2
,
3
].
For example, many common industrial products comprise a variety of screw parts and nut
parts generated from the screw template and the nut template.
When we disassemble the recyclable part objects from products into the recycling bins,
it is common that there is a stacked scene including parametric part objects from multiple
templates. Then, the part objects from the same template are sorted into their own bins
according to their parameter values. In recent years, robots guided by visual systems are
often used to sort the part objects automatically. However, due to the varied templates, the
frequent changes of parameter values, heavy occlusion, sensor noise, etc., the accurate 6D
pose estimation and parameter values prediction in such stacked scenes are challenging.
Accurate 6D pose estimation, i.e., 3D translation and 3D rotation, is very essential
for robotic grasping tasks. Existing 6D pose estimation methods based on deep learning
Machines 2021, 9, 321. https://doi.org/10.3390/machines9120321 https://www.mdpi.com/journal/machines