具有隐式深度表示和紧密耦合图优化的DiT SLAM实时密集视觉惯性SLAM

ID：39154

阅读量：0

大小：17.65 MB

页数：22页

时间：2023-03-14

金币：2

上传者：战必胜

Citation: Zhao, M.; Zhou, D.; Song,

X.; Chen, X.; Zhang, L. DiT-SLAM:

Real-Time Dense Visual-Inertial

SLAM with Implicit Depth

Representation and

Tightly-Coupled Graph Optimization.

Sensors 2022, 22, 3389. https://

doi.org/10.3390/s22093389

Academic Editors: Luis Payá, Oscar

Reinoso García and Helder Jesus

Araújo

Received: 19 March 2022

Accepted: 27 April 2022

Published: 28 April 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

sensors

Article

DiT-SLAM: Real-Time Dense Visual-Inertial SLAM with

Implicit Depth Representation and Tightly-Coupled

Graph Optimization

Mingle Zhao

1,2

, Dingfu Zhou

2,3,

*, Xibin Song

2,3

, Xiuwan Chen

and Liangjun Zhang

2,3

Institute of Remote Sensing and Geographic Information System, Peking University, Beijing 100871, China;

zhaomingle@pku.edu.cn (M.Z.); xwchen@pku.edu.cn (X.C.)

Robotics and Autonomous Driving Laboratory, Baidu Research, Beijing 100085, China;

song.sducg@gmail.com (X.S.); liangjunzhang@baidu.com (L.Z.)

National Engineering Laboratory of Deep Learning Technology and Application, Beijing 100085, China

* Correspondence: dingfuzhou@gmail.com

Abstract:

Recently, generating dense maps in real-time has become a hot research topic in the

mobile robotics community, since dense maps can provide more informative and continuous features

compared with sparse maps. Implicit depth representation (e.g., the depth code) derived from deep

neural networks has been employed in the visual-only or visual-inertial simultaneous localization and

mapping (SLAM) systems, which achieve promising performances on both camera motion and local

dense geometry estimations from monocular images. However, the existing visual-inertial SLAM

systems combined with depth codes are either built on a ﬁlter-based SLAM framework, which can

only update poses and maps in a relatively small local time window, or based on a loosely-coupled

framework, while the prior geometric constraints from the depth estimation network have not been

employed for boosting the state estimation. To well address these drawbacks, we propose DiT-

SLAM, a novel real-time

ense visual-inertial SLAM with

mplicit depth representation and

ightly-

coupled graph optimization. Most importantly, the poses, sparse maps, and low-dimensional depth

codes are optimized with the tightly-coupled graph by considering the visual, inertial, and depth

residuals simultaneously. Meanwhile, we propose a light-weight monocular depth estimation and

completion network, which is combined with attention mechanisms and the conditional variational

auto-encoder (CVAE) to predict the uncertainty-aware dense depth maps from more low-dimensional

codes. Furthermore, a robust point sampling strategy introducing the spatial distribution of 2D

feature points is also proposed to provide geometric constraints in the tightly-coupled optimization,

especially for textureless or featureless cases in indoor environments. We evaluate our system on

open benchmarks. The proposed methods achieve better performances on both the dense depth

estimation and the trajectory estimation compared to the baseline and other systems.

Keywords:

visual-inertial SLAM; depth estimation; implicit representation; graph optimization;

dense mapping

1. Introduction

Vision-based SLAM systems have been widely explored in the past 20 years and

many representative systems have been proposed, which include filter-based approaches

(e.g., MonoSLAM [

] and the optimization-based approaches (such as PTAM [

], DTAM [

and ORB-SLAM serials [

–

])). Recently, visual-inertial odometry or SLAM methods

combined with deep neural networks can achieve more accurate localization results [

–

while in the real-time applications, the dominated SLAM approaches are also based on

key or corner points extraction and tracking for accurate pose estimation. Furthermore,

for building the association between multi-frames in a longtime, a sparse structure map

is usually constructed and the bundle adjustment technique is utilized for optimizing

Sensors 2022, 22, 3389. https://doi.org/10.3390/s22093389 https://www.mdpi.com/journal/sensors

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 22



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

具有隐式深度表示和紧密耦合图优化的DiT SLAM实时密集视觉惯性SLAM

最近更新

大家都在看

相关文章

相关标签