Citation: Cheng, L.; Zheng, X.; Zhao,
M.; Dou, R.; Yu, S.; Wu, N.; Liu, L.
SiamMixer: A Lightweight and
Hardware-Friendly Visual
Object-Tracking Network. Sensors
2022, 22, 1585. https://doi.org/
10.3390/s22041585
Academic Editors: Yangquan Chen,
Subhas Mukhopadhyay, Nunzio
Cennamo, M. Jamal Deen, Junseop
Lee and Simone Morais
Received: 24 January 2022
Accepted: 14 February 2022
Published: 18 February 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
SiamMixer: A Lightweight and Hardware-Friendly Visual
Object-Tracking Network
Li Cheng
1,2
, Xuemin Zheng
1,2
, Mingxin Zhao
1,2
, Runjiang Dou
1,
*, Shuangming Yu
1
, Nanjian Wu
1,2,3
and Liyuan Liu
1
1
State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors,
Chinese Academy of Sciences, Beijing 100083, China; chengli17@semi.ac.cn (L.C.); zxm16@semi.ac.cn (X.Z.);
zhaomingxin17@semi.ac.cn (M.Z.); yushuangming@semi.ac.cn (S.Y.); nanjian@red.semi.ac.cn (N.W.);
liuly@semi.ac.cn (L.L.)
2
Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences,
Beijing 100049, China
3
The Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences,
Beijing 100083, China
* Correspondence: dourj@semi.ac.cn
Abstract:
Siamese networks have been extensively studied in recent years. Most of the previous
research focuses on improving accuracy, while merely a few recognize the necessity of reducing
parameter redundancy and computation load. Even less work has been done to optimize the runtime
memory cost when designing networks, making the Siamese-network-based tracker difficult to
deploy on edge devices. In this paper, we present SiamMixer, a lightweight and hardware-friendly
visual object-tracking network. It uses patch-by-patch inference to reduce memory use in shallow
layers, where each small image region is processed individually. It merges and globally encodes
feature maps in deep layers to enhance accuracy. Benefiting from these techniques, SiamMixer
demonstrates a comparable accuracy to other large trackers with only 286 kB parameters and 196 kB
extra memory use for feature maps. Additionally, we verify the impact of various activation functions
and replace all activation functions with ReLU in SiamMixer. This reduces the cost when deploying
on mobile devices.
Keywords:
visual object-tracking; deep features; siamese network; lightweight neural network; edge
computing devices
1. Introduction
Visual object-tracking is a fundamental problem in computer vision, whose goal is
to locate the target in subsequent video frames based on its position in the initial frame.
Visual object-tracking plays an essential role in many fields such as surveillance, machine
vision, and human–computer interaction [1].
Discriminative Correlation Filters (DCFs) and Siamese networks are the dominant
tracking algorithm models presently. DCF emerged much earlier than Siamese network
trackers. It uses cyclic moving training samples to achieve dense sampling and uses a
fast Fourier transform to accelerate the learning and applying of the correlation filters. It
has the advantage of high computational efficiency. However, the design of the feature
descriptors requires expert intervention, and the circular sampling produces artifacts at the
search boundary that can affect the tracking results. The emergence of Siamese networks
provides an end-to-end solution and eliminates the tediousness of manually designing
feature descriptors while exhibiting decent tracking performance.
The Siamese network tracker treats visual target tracking as a similarity learning
problem. The neural network is used to learn the similarity descriptor function between
the target and the search region. The Siamese network consists of two branches. The input
Sensors 2022, 22, 1585. https://doi.org/10.3390/s22041585 https://www.mdpi.com/journal/sensors