用于卷积神经网络加速和图像信号处理的具有流水线工作流的混合视觉处理单元-2021年

ID:37330

大小:2.12 MB

页数:25页

时间:2023-03-03

金币:10

上传者:战必胜
electronics
Article
A Hybrid Vision Processing Unit with a Pipelined Workflow for
Convolutional Neural Network Accelerating and Image
Signal Processing
Peng Liu
1
and Yan Song
2,
*

 
Citation: Liu, P.; Song, Y. A Hybrid
Vision Processing Unit with a
Pipelined Workflow for
Convolutional Neural Network
Accelerating and Image Signal
Processing. Electronics 2021, 10, 2989.
https://doi.org/10.3390/
electronics10232989
Academic Editors: Nunzio Cennamo,
YangQuan Chen,
Subhas Mukhopadhyay and
Simone Morais
Received: 11 November 2021
Accepted: 30 November 2021
Published: 1 December 2021
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
School of Microelectronics, Tianjin University, Tianjin 300072, China; zationlue@tju.edu.cn
2
Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
* Correspondence: ysong@sdu.edu.cn
Abstract:
Vision processing chips have been widely used in image processing and recognition tasks.
They are conventionally designed based on the image signal processing (ISP) units directly connected
with the sensors. In recent years, convolutional neural networks (CNNs) have become the dominant
tools for many state-of-the-art vision processing tasks. However, CNNs cannot be processed by a
conventional vision processing unit (VPU) with a high speed. On the other side, the CNN processing
units cannot process the RAW images from the sensors directly and an ISP unit is required. This
makes a vision system inefficient with a lot of data transmission and redundant hardware resources.
Additionally, many CNN processing units suffer from a low flexibility for various CNN operations.
To solve this problem, this paper proposed an efficient vision processing unit based on a hybrid
processing elements array for both CNN accelerating and ISP. Resources are highly shared in this
VPU, and a pipelined workflow is introduced to accelerate the vision tasks. We implement the
proposed VPU on the Field-Programmable Gate Array (FPGA) platform and various vision tasks are
tested on it. The results show that this VPU achieves a high efficiency for both CNN processing and
ISP and shows a significant reduction in energy consumption for vision tasks consisting of CNNs
and ISP. For various CNN tasks, it maintains an average multiply accumulator utilization of over
94% and achieves a performance of 163.2 GOPS with a frequency of 200 MHz.
Keywords:
vision processing unit; neural network processing unit; image signal processing unit;
image recognition
1. Introduction
Vision processing chips have proven to be highly efficient for computer vision tasks
by integrating the image sensor and vision processing unit (VPU) together in the recent
works [
1
3
]. Most of them utilize a Single-Instruction-Multiple-Data (SIMD) array of
processing elements (PE) connected with the sensor directly. Consequently, they can
eliminate the pixels transmission bottleneck and execute vision tasks in a parallel way.
The vision tasks mainly consist of image signal processing (ISP) algorithms and recog-
nition algorithms [
1
], as illustrated in Figure 1. All the algorithms are performed on the
PE array in the VPU. On the conventional vision chips, recognition algorithms includ-
ing Speed-up Robust Features (SURF) [
4
], Scale-Invariant Feature Transform (SIFT) [
5
]
and Features from Accelerated Segment Test (FAST) [
6
] are usually applied. Recently,
the artificial neural networks have shown great performance on the computer vision
tasks
[710]. Therefore, works [1,11]
proposed the VPUs that try to exploit the conven-
tional PE array for self-organizing map (SOM) neural networks. However, these conven-
tional architectures are not efficient for modern neural networks. They do not contain
the multiply accumulators (MAC), which are essential to accelerate the neural network
processing
[1214]
. For instance, the convolutional neural networks (CNNs) are very im-
Electronics 2021, 10, 2989. https://doi.org/10.3390/electronics10232989 https://www.mdpi.com/journal/electronics
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭