Citation: Zhuang, Y.; Jiang, X.; Gao,
Y.; Fang, Z.; Fujita, H. Unsupervised
Monocular Visual Odometry for
Fast-Moving Scenes Based on Optical
Flow Network with Feature Point
Matching Constraint. Sensors 2022,
22, 9647. https://doi.org/10.3390/
s22249647
Academic Editors: Luis Payá, Oscar
Reinoso García and Helder Jesus
Araújo
Received: 12 November 2022
Accepted: 2 December 2022
Published: 9 December 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Unsupervised Monocular Visual Odometry for Fast-Moving
Scenes Based on Optical Flow Network with Feature Point
Matching Constraint
Yuji Zhuang
1
, Xiaoyan Jiang
1,
*, Yongbin Gao
1,
*, Zhijun Fang
1
and Hamido Fujita
2,3,4
1
School of Electronic and Electrical Engineering, Shanghai University of Engineering Science,
Shanghai 201600, China
2
Faculty of Information Technology, HUTECH University, Ho Chi Minh City, Vietnam
3
i-SOMET Inc., Morioka 020-0104, Japan
4
Regional Research Center, Iwate Prefectural University, Takizawa 020-0693, Japan
* Correspondence: xiaoyan.jiang@sues.edu.cn (X.J.); gaoyongbin@sues.edu.cn (Y.G.)
Abstract: Robust and accurate visual feature tracking is essential for good pose estimation in visual
odometry. However, in fast-moving scenes, feature point extraction and matching are unstable
because of blurred images and large image disparity. In this paper, we propose an unsupervised
monocular visual odometry framework based on a fusion of features extracted from two sources,
that is, the optical flow network and the traditional point feature extractor. In the training process,
point features are generated for scene images and the outliers of matched point pairs are filtered by
FlannMatch. Meanwhile, the optical flow network constrained by the principle of forward–backward
flow consistency is used to select another group of corresponding point pairs. The Euclidean
distance between the matching points found by FlannMatch and the corresponding point pairs
by the flow network is added to the loss function of the flow network. Compared with SURF,
the trained flow network shows more robust performance in complicated fast-motion scenarios.
Furthermore, we propose the AvgFlow estimation module, which selects one group of the matched
point pairs generated by the two methods according to the scene motion. The camera pose is
then recovered by Perspective-n-Point (PnP) or the epipolar geometry. Experiments conducted on
the KITTI Odometry dataset verify the effectiveness of the trajectory estimation of our approach,
especially in fast-moving scenarios.
Keywords:
visual odometry; flow network; feature point matching; depth network; trajectory drift;
SLAM
1. Introduction
Simultaneous localization and mapping (SLAM) [
1
] is a core part of autonomous
navigation systems. For example, robots can adopt SLAM to realize their localization and
reconstruct the scene maps in unknown environments. Compared with SLAM systems,
visual odometry (VO) focuses on the egomotion estimation of the agent itself, predicting
the camera trajectory frame by frame using efficient features. In most cases, VO estimates
the egomotion faster and more efficiently than SLAM systems. VO estimates the pose
changing of the camera from adjacent frames. The estimated subsequent pose is based on
the previous results, followed by an online local optimization process. Inevitably, trajectory
drift accumulates as time goes on, which always leads to the failure of the VO system.
Hence, robust and accurate visual feature tracking is essential for good pose estimation in
visual odometry. Famous feature point extractors, such as, SIFT [
2
] and SURF [
3
], which is
faster than SIFT, are the basis for accurate feature matching.
To reduce the accumulative error, researchers adopt the loop detection. The normal
way to realize loopback detection is to perform a feature matching on any two images and
Sensors 2022, 22, 9647. https://doi.org/10.3390/s22249647 https://www.mdpi.com/journal/sensors