Citation: Chen, B.; Lv, X.; Liu, C.;
Jiao, H. SGSNet: A Lightweight
Depth Completion Network Based
on Secondary Guidance and Spatial
Fusion. Sensors 2022, 22, 6414.
https://doi.org/10.3390/s22176414
Academic Editors: M. Jamal Deen,
Subhas Mukhopadhyay, Yangquan
Chen, Simone Morais, Nunzio
Cennamo and Junseop Lee
Received: 16 July 2022
Accepted: 20 August 2022
Published: 25 August 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
SGSNet: A Lightweight Depth Completion Network Based on
Secondary Guidance and Spatial Fusion
Baifan Chen
1
, Xiaotian Lv
1,
*, Chongliang Liu
2
and Hao Jiao
2
1
The School of Automation, Central South University, Changsha 410083, China
2
Beijing Institute of Automation Equipment, Beijing 100074, China
* Correspondence: lvxiaotian@csu.edu.cn
Abstract:
The depth completion task aims to generate a dense depth map from a sparse depth map
and the corresponding RGB image. As a data preprocessing task, obtaining denser depth maps
without affecting the real-time performance of downstream tasks is the challenge. In this paper,
we propose a lightweight depth completion network based on secondary guidance and spatial
fusion named SGSNet. We design the image feature extraction module to better extract features
from different scales between and within layers in parallel and to generate guidance features. Then,
SGSNet uses the secondary guidance to complete the depth completion. The first guidance uses the
lightweight guidance module to quickly guide LiDAR feature extraction with the texture features of
RGB images. The second guidance uses the depth information completion module for sparse depth
map feature completion and inputs it into the DA-CSPN++ module to complete the dense depth map
re-guidance. By using a lightweight bootstrap module, the overall network runs ten times faster than
the baseline. The overall network is relatively lightweight, up to thirty frames, which is sufficient to
meet the speed needs of large SLAM and three-dimensional reconstruction for sensor data extraction.
At the time of submission, the accuracy of the algorithm in SGSNet ranked first in the KITTI ranking
of lightweight depth completion methods. It was 37.5% faster than the top published algorithms in
the rank and was second in the full ranking.
Keywords: depth completion; secondary guidance; spatial fusion
1. Introduction
With the continuous development of 3D computer vision research, the demands for
dense depth maps have gradually increased. Therefore, the depth completion task as
data preprocessing has received much attention in AR [
1
,
2
], VR [
3
], SLAM [
4
], and 3D
reconstruction [
5
,
6
]. Depth completion mainly faces the following three challenges: (1) the
current depth completion tasks are all slow and cannot meet the real-time requirements of
large projects; (2) RGB image features and LiDAR features are in different modalities. In
addition, because they all describe the same scenes, they have a large amount of coupled
information, such as relative position relationship and object shapes. These factors all make
it hard to fuse these features; (3) The edge blurring problem leads to a large error at the
edges of the object in the depth map obtained by the depth completion network.
Researchers have developed a variety of solutions, the most recent of which rely on
Convolutional Neural Networks. The original and most commonly used depth completion
network is the one-branch fusion network [
5
,
7
,
8
] shown in Figure 1a, by feeding features
obtained by concatenating features from two different modalities at the channel end into
the depth learning network. This learning method is less effective in learning but runs fast.
However, joint representation alone will result in missing original features. Thus,
some researchers have proposed the two-branch fusion network [
9
–
14
] using coordinated
representations for depth completion, whose structural block diagram is shown in
Figure 1b
.
It uses different networks to train their individual features and fuse them through a
Sensors 2022, 22, 6414. https://doi.org/10.3390/s22176414 https://www.mdpi.com/journal/sensors