Citation: Li, Z.; Cao, J.; Hao, Q.;
Zhao, X.; Ning, Y.; Li, D.
DAN-SuperPoint: Self-Supervised
Feature Point Detection Algorithm
with Dual Attention Network.
Sensors 2022, 22, 1940. https://
doi.org/10.3390/s22051940
Academic Editor: Nunzio Cennamo
Received: 26 January 2022
Accepted: 25 February 2022
Published: 2 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
DAN-SuperPoint: Self-Supervised Feature Point Detection
Algorithm with Dual Attention Network
Zhaoyang Li
1
, Jie Cao
2
, Qun Hao
2,
*, Xue Zhao
2
, Yaqian Ning
2
and Dongxing Li
1
1
School of Mechanical Engineering, Shandong University of Technology, Zibo 255000, China;
20501020029@stumail.sdut.edu.cn (Z.L.); lidongxing@sdut.edu.cn (D.L.)
2
School of Optics and Photonics, Beijing Institute of Technology, Beijing 100081, China; caojie@bit.edu.cn (J.C.);
3220210544@bit.edu.cn (X.Z.); ningyq@bit.edu.cn (Y.N.)
* Correspondence: qhao@bit.edu.cn
Abstract:
In view of the poor performance of traditional feature point detection methods in low-
texture situations, we design a new self-supervised feature extraction network that can be applied to
the visual odometer (VO) front-end feature extraction module based on the deep learning method.
First, the network uses the feature pyramid structure to perform multi-scale feature fusion to obtain a
feature map containing multi-scale information. Then, the feature map is passed through the position
attention module and the channel attention module to obtain the feature dependency relationship of
the spatial dimension and the channel dimension, respectively, and the weighted spatial feature map
and the channel feature map are added element by element to enhance the feature representation.
Finally, the weighted feature maps are trained for detectors and descriptors respectively. In addition,
in order to improve the prediction accuracy of feature point locations and speed up the network
convergence, we add a confidence loss term and a tolerance loss term to the loss functions of the
detector and descriptor, respectively. The experiments show that our network achieves satisfactory
performance under the Hpatches dataset and KITTI dataset, indicating the reliability of the network.
Keywords: feature point detection; attention module; multi-scale feature fusion; deep learning
1. Introduction
The detection of feature points and the establishment of descriptors are important
steps in image matching. In computer vision-based applications such as simultaneous
localization and mapping (SLAM), structure-from-motion (SFM), and image retrieval, the
processing of image feature points determines the correspondence between different images.
Accurate extraction of feature points can improve the matching accuracy of images. With the
wide applications of computer vision and the more complex environment faced by image
processing, it is particularly important to find a stable feature point detection method.
At present, the processing methods for image feature points can be divided into
traditional methods and deep learning-based methods. Traditional feature extraction
methods are difficult to achieve satisfactory performance in challenging situations. The
scale invariant feature transform (SIFT) algorithm [
1
] was scale invariant but not real-time.
Rubele et al. [
2
] proposed the oriented fast and rotated brief (ORB) algorithm, which was
improved on the basis of the features from accelerated segment test (FAST) algorithm [
3
] to
make the feature points have rotation invariance and real-time performance. Mair et al. [
4
]
proposed the adaptive and generic corner detection based on the accelerated segment test
(AGAST) algorithm, which can maintain consistent angular responses without training
and has the same reusability as the FAST algorithm. However, the above algorithms
cannot extract a sufficient number of feature points in low texture scenes and cannot
keep the accuracy of feature point extraction stable. Samuele [
5
] proposed a feature point
detection method based on the wave equation, which can maintain a certain accuracy
on low-texture objects with symmetry, but was not suitable for irregular scenes with
Sensors 2022, 22, 1940. https://doi.org/10.3390/s22051940 https://www.mdpi.com/journal/sensors