
Citation: Liang, B.; Han, S.; Li, W.;
Fu, D.; He, R.; Huang, G. Accurate
Spatial Positioning of Target Based on
the Fusion of Uncalibrated Image and
GNSS. Remote Sens. 2022, 14, 3877.
https://doi.org/10.3390/rs14163877
Academic Editors: M. Jamal Deen,
Subhas Mukhopadhyay,
Yangquan Chen, Simone Morais,
Nunzio Cennamo and Junseop Lee
Received: 29 June 2022
Accepted: 7 August 2022
Published: 10 August 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Accurate Spatial Positioning of Target Based on the Fusion of
Uncalibrated Image and GNSS
Binbin Liang
1
, Songchen Han
1
, Wei Li
1,
*, Daoyong Fu
1
, Ruliang He
1
and Guoxin Huang
2
1
School of Aeronautics and Astronautics, Sichuan University, Chengdu 610065, China
2
National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University,
Chengdu 610065, China
* Correspondence: li.wei@scu.edu.cn
Abstract:
The accurate spatial positioning of the target in a fixed camera image is a critical sensing
technique. Conventional visual spatial positioning methods rely on tedious camera calibration and
face great challenges in selecting the representative feature points to compute the position of the
target, especially when existing occlusion or in remote scenes. In order to avoid these deficiencies,
this paper proposes a deep learning approach for accurate visual spatial positioning of the targets
with the assistance of Global Navigation Satellite System (GNSS). It contains two stages: the first
stage trains a hybrid supervised and unsupervised auto-encoder regression network offline to gain
capability of regressing geolocation (longitude and latitude) directly from the fusion of image and
GNSS, and learns an error scale factor to evaluate the regression error. The second stage firstly
predicts regressed accurate geolocation online from the observed image and GNSS measurement, and
then filters the predictive geolocation and the measured GNSS to output the optimal geolocation. The
experimental results showed that the proposed approach increased the average positioning accuracy
by 56.83%, 37.25%, 41.62% in a simulated scenario and 31.25%, 7.43%, 38.28% in a real-world scenario,
compared with GNSS, the Interacting Multiple Model
−
Unscented Kalman Filters (IMM-UKF) and
the supervised deep learning approach, respectively. Other improvements were also achieved in
positioning stability, robustness, generalization, and performance in GNSS denied environments.
Keywords:
visual spatial positioning; uncalibrated image; global navigation satellite system; multi-
sensor fusion; deep learning
1. Introduction
Fixed cameras are widely deployed in outdoor areas to provide fine-grained informa-
tion about the physical world. The accurate and reliable spatial positioning of the target in
the fixed camera image is an important sensing technique in many promising applications,
such as surveillance of autonomous vehicles, monitoring of mobile robots, digital twin, sea
pilling, airport security surveillance and so on.
The current visual spatial positioning methods can be divided into two categories:
calibrated methods [
1
–
3
] and uncalibrated methods [
4
–
6
]. The calibrated methods heavily
rely on camera calibration. However, even with some matured camera calibration methods,
the calibration procedure can be tedious and require a certain level of expertise [
1
]. The
uncalibrated methods require no camera calibration, but highly rely on feature point
selection and matching [
5
]. However, in many scenarios, for both calibrated and existing
uncalibrated methods, choosing the representative positioning feature points in the image
to compute the position of the target is challenging, especially in the occasion of occlusion
or in remote scenes. For instance, for the large-size and complex-shaped airplane at an
airport, as shown in Figure 1a, it is dramatically difficult to choose the representative
positioning feature point (i.e., the landing gear tyre) in the image to compute the spatial
position of the airplane, due to the mutual occlusion of its components and the difficulty
Remote Sens. 2022, 14, 3877. https://doi.org/10.3390/rs14163877 https://www.mdpi.com/journal/remotesensing