具有隐式深度表示和紧密耦合图优化的DiT SLAM实时密集视觉惯性SLAM

ID:39154

大小:17.65 MB

页数:22页

时间:2023-03-14

金币:2

上传者:战必胜
Citation: Zhao, M.; Zhou, D.; Song,
X.; Chen, X.; Zhang, L. DiT-SLAM:
Real-Time Dense Visual-Inertial
SLAM with Implicit Depth
Representation and
Tightly-Coupled Graph Optimization.
Sensors 2022, 22, 3389. https://
doi.org/10.3390/s22093389
Academic Editors: Luis Payá, Oscar
Reinoso García and Helder Jesus
Araújo
Received: 19 March 2022
Accepted: 27 April 2022
Published: 28 April 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with
Implicit Depth Representation and Tightly-Coupled
Graph Optimization
Mingle Zhao
1,2
, Dingfu Zhou
2,3,
*, Xibin Song
2,3
, Xiuwan Chen
1
and Liangjun Zhang
2,3
1
Institute of Remote Sensing and Geographic Information System, Peking University, Beijing 100871, China;
zhaomingle@pku.edu.cn (M.Z.); xwchen@pku.edu.cn (X.C.)
2
Robotics and Autonomous Driving Laboratory, Baidu Research, Beijing 100085, China;
song.sducg@gmail.com (X.S.); liangjunzhang@baidu.com (L.Z.)
3
National Engineering Laboratory of Deep Learning Technology and Application, Beijing 100085, China
* Correspondence: dingfuzhou@gmail.com
Abstract:
Recently, generating dense maps in real-time has become a hot research topic in the
mobile robotics community, since dense maps can provide more informative and continuous features
compared with sparse maps. Implicit depth representation (e.g., the depth code) derived from deep
neural networks has been employed in the visual-only or visual-inertial simultaneous localization and
mapping (SLAM) systems, which achieve promising performances on both camera motion and local
dense geometry estimations from monocular images. However, the existing visual-inertial SLAM
systems combined with depth codes are either built on a filter-based SLAM framework, which can
only update poses and maps in a relatively small local time window, or based on a loosely-coupled
framework, while the prior geometric constraints from the depth estimation network have not been
employed for boosting the state estimation. To well address these drawbacks, we propose DiT-
SLAM, a novel real-time
D
ense visual-inertial SLAM with
i
mplicit depth representation and
T
ightly-
coupled graph optimization. Most importantly, the poses, sparse maps, and low-dimensional depth
codes are optimized with the tightly-coupled graph by considering the visual, inertial, and depth
residuals simultaneously. Meanwhile, we propose a light-weight monocular depth estimation and
completion network, which is combined with attention mechanisms and the conditional variational
auto-encoder (CVAE) to predict the uncertainty-aware dense depth maps from more low-dimensional
codes. Furthermore, a robust point sampling strategy introducing the spatial distribution of 2D
feature points is also proposed to provide geometric constraints in the tightly-coupled optimization,
especially for textureless or featureless cases in indoor environments. We evaluate our system on
open benchmarks. The proposed methods achieve better performances on both the dense depth
estimation and the trajectory estimation compared to the baseline and other systems.
Keywords:
visual-inertial SLAM; depth estimation; implicit representation; graph optimization;
dense mapping
1. Introduction
Vision-based SLAM systems have been widely explored in the past 20 years and
many representative systems have been proposed, which include filter-based approaches
(e.g., MonoSLAM [
1
,
2
] and the optimization-based approaches (such as PTAM [
3
], DTAM [
4
],
and ORB-SLAM serials [
5
7
])). Recently, visual-inertial odometry or SLAM methods
combined with deep neural networks can achieve more accurate localization results [
8
11
],
while in the real-time applications, the dominated SLAM approaches are also based on
key or corner points extraction and tracking for accurate pose estimation. Furthermore,
for building the association between multi-frames in a longtime, a sparse structure map
is usually constructed and the bundle adjustment technique is utilized for optimizing
Sensors 2022, 22, 3389. https://doi.org/10.3390/s22093389 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭