SiamMixer一个轻量级和硬件友好的视觉对象跟踪网络-2022年

ID:37215

大小:1.61 MB

页数:15页

时间:2023-03-03

金币:10

上传者:战必胜

 
Citation: Cheng, L.; Zheng, X.; Zhao,
M.; Dou, R.; Yu, S.; Wu, N.; Liu, L.
SiamMixer: A Lightweight and
Hardware-Friendly Visual
Object-Tracking Network. Sensors
2022, 22, 1585. https://doi.org/
10.3390/s22041585
Academic Editors: Yangquan Chen,
Subhas Mukhopadhyay, Nunzio
Cennamo, M. Jamal Deen, Junseop
Lee and Simone Morais
Received: 24 January 2022
Accepted: 14 February 2022
Published: 18 February 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
SiamMixer: A Lightweight and Hardware-Friendly Visual
Object-Tracking Network
Li Cheng
1,2
, Xuemin Zheng
1,2
, Mingxin Zhao
1,2
, Runjiang Dou
1,
*, Shuangming Yu
1
, Nanjian Wu
1,2,3
and Liyuan Liu
1
1
State Key Laboratory of Superlattices and Microstructures, Institute of Semiconductors,
Chinese Academy of Sciences, Beijing 100083, China; chengli17@semi.ac.cn (L.C.); zxm16@semi.ac.cn (X.Z.);
zhaomingxin17@semi.ac.cn (M.Z.); yushuangming@semi.ac.cn (S.Y.); nanjian@red.semi.ac.cn (N.W.);
liuly@semi.ac.cn (L.L.)
2
Center of Materials Science and Optoelectronics Engineering, University of Chinese Academy of Sciences,
Beijing 100049, China
3
The Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences,
Beijing 100083, China
* Correspondence: dourj@semi.ac.cn
Abstract:
Siamese networks have been extensively studied in recent years. Most of the previous
research focuses on improving accuracy, while merely a few recognize the necessity of reducing
parameter redundancy and computation load. Even less work has been done to optimize the runtime
memory cost when designing networks, making the Siamese-network-based tracker difficult to
deploy on edge devices. In this paper, we present SiamMixer, a lightweight and hardware-friendly
visual object-tracking network. It uses patch-by-patch inference to reduce memory use in shallow
layers, where each small image region is processed individually. It merges and globally encodes
feature maps in deep layers to enhance accuracy. Benefiting from these techniques, SiamMixer
demonstrates a comparable accuracy to other large trackers with only 286 kB parameters and 196 kB
extra memory use for feature maps. Additionally, we verify the impact of various activation functions
and replace all activation functions with ReLU in SiamMixer. This reduces the cost when deploying
on mobile devices.
Keywords:
visual object-tracking; deep features; siamese network; lightweight neural network; edge
computing devices
1. Introduction
Visual object-tracking is a fundamental problem in computer vision, whose goal is
to locate the target in subsequent video frames based on its position in the initial frame.
Visual object-tracking plays an essential role in many fields such as surveillance, machine
vision, and human–computer interaction [1].
Discriminative Correlation Filters (DCFs) and Siamese networks are the dominant
tracking algorithm models presently. DCF emerged much earlier than Siamese network
trackers. It uses cyclic moving training samples to achieve dense sampling and uses a
fast Fourier transform to accelerate the learning and applying of the correlation filters. It
has the advantage of high computational efficiency. However, the design of the feature
descriptors requires expert intervention, and the circular sampling produces artifacts at the
search boundary that can affect the tracking results. The emergence of Siamese networks
provides an end-to-end solution and eliminates the tediousness of manually designing
feature descriptors while exhibiting decent tracking performance.
The Siamese network tracker treats visual target tracking as a similarity learning
problem. The neural network is used to learn the similarity descriptor function between
the target and the search region. The Siamese network consists of two branches. The input
Sensors 2022, 22, 1585. https://doi.org/10.3390/s22041585 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭