Citation: Chen, H.; Li, C.; C.
Research on the Correlation Filter
Tracking Model Based on the
Deep-Pruned Feature Network. Appl.
Sci. 2022, 12, 11490. https://doi.org/
10.3390/app122211490
Academic Editor: Silvia Liberata Ullo
Received: 17 October 2022
Accepted: 7 November 2022
Published: 12 November 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Research on the Correlation Filter Tracking Model Based on the
Deep-Pruned Feature Network
Honglin Chen
1,
* , Chunting Li
2
and Chaomurilige
1
1
School of Information Engineering, Minzu University of China, Beijing 100081, China
2
Archives, Yunnan University, Kunming 650091, China
* Correspondence: 20301814@muc.edu.cn
Abstract:
Visual tracking is one of the key research fields in computer vision. Based on the combina-
tion of correlation filter tracking (CFT) model and deep convolutional neural networks (DCNNs),
deep correlation filter tracking (DCFT) has recently become a critical issue in visual tracking because
of CFT’s rapidity and DCNN’s better feature representation. However, DCNNs are often complex in
structure, which most possibly results in the conflict between the rapidity and accuracy of DCFT. To
reduce such conflict, this paper proposes a model mainly including: (1) Based on the pre-pruning
network obtained by feature channel importance, an optimal global tracking pruning rate (GTPR)
is determined in terms of the contribution of filter channels to tracking response. (2) Based on
(GTPR), an alternative convolutional kernel is defined to replace non-important channel kernels,
which leads to the further pruning of the feature network. (3) An online updating pruned feature
network with a structural similarity index is employed to adapt the model to tracking scene changes.
(4) The proposed model was performed on OTB2013; experimental results demonstrate the model can
effectively enhance speed with a 45% increment while guaranteeing tracking accuracy, and improve
tracking accuracy with a 4% increment when tracking scene changes take place.
Keywords:
object tracking; correlation filter; deep convolutional neural network; network pruning;
network online updating
1. Introduction
Visual object tracking is an important branch of computer vision, which can be de-
scribed as the process of estimating the motion trajectory of a target in subsequent frames,
provided that the state information (position, size, etc.) of the target is given in the first
frame of the tracking sequence [
1
–
4
]. The traditional visual object tracking algorithms
mainly include Mean Shift [
5
], Lucas Kanade [
6
], Particle filter [
7
], etc. In 2010, Bolme [
8
]
introduced correlation filtering into the field of object tracking and proposed the MOSSE
(Minimum Output Sum of Squared Error) algorithm. The essence of MOSSE is to first con-
struct a correlation filter with the features of the target in the first video frame; then, starting
from the second video frame, the filter correlates the features of the target by Fast Fourier
Transform (FFT), which is the reason why the tracking speed is so fast. The maximum value
of the filter response is the predicted position of the target, and the filter is updated with the
predicted position of the target. Based on MOSSE, many kinds of variant algorithms have
been proposed, such as Circulant Structure of Tracking-by-Detection with Kernels (CSK) [
9
],
Kernelized Correlation Filters (KCF) [
10
], Scale Adaptive Multiple Feature (SAMF) [
11
],
to further enhance the MOSSE’s performance. However, most of these variant algorithms
extract target features by hand, which lacks flexibility, online performance, and accuracy
for feature extraction. Therefore, it is necessary for the tracking based on the correlation
filter to have some ways to obtain target features online with high flexibility and accuracy.
CNNs (convolutional neural networks) have been widely studied in the field of image
recognition since the introduction of a new deep structure and dropout method by Alexnet
Appl. Sci. 2022, 12, 11490. https://doi.org/10.3390/app122211490 https://www.mdpi.com/journal/applsci