Article
A Hybrid Vision Processing Unit with a Pipelined Workflow for
Convolutional Neural Network Accelerating and Image
Signal Processing
Peng Liu
1
and Yan Song
2,
*
Citation: Liu, P.; Song, Y. A Hybrid
Vision Processing Unit with a
Pipelined Workflow for
Convolutional Neural Network
Accelerating and Image Signal
Processing. Electronics 2021, 10, 2989.
https://doi.org/10.3390/
electronics10232989
Academic Editors: Nunzio Cennamo,
YangQuan Chen,
Subhas Mukhopadhyay and
Simone Morais
Received: 11 November 2021
Accepted: 30 November 2021
Published: 1 December 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
School of Microelectronics, Tianjin University, Tianjin 300072, China; zationlue@tju.edu.cn
2
Institute of Marine Science and Technology, Shandong University, Qingdao 266237, China
* Correspondence: ysong@sdu.edu.cn
Abstract:
Vision processing chips have been widely used in image processing and recognition tasks.
They are conventionally designed based on the image signal processing (ISP) units directly connected
with the sensors. In recent years, convolutional neural networks (CNNs) have become the dominant
tools for many state-of-the-art vision processing tasks. However, CNNs cannot be processed by a
conventional vision processing unit (VPU) with a high speed. On the other side, the CNN processing
units cannot process the RAW images from the sensors directly and an ISP unit is required. This
makes a vision system inefficient with a lot of data transmission and redundant hardware resources.
Additionally, many CNN processing units suffer from a low flexibility for various CNN operations.
To solve this problem, this paper proposed an efficient vision processing unit based on a hybrid
processing elements array for both CNN accelerating and ISP. Resources are highly shared in this
VPU, and a pipelined workflow is introduced to accelerate the vision tasks. We implement the
proposed VPU on the Field-Programmable Gate Array (FPGA) platform and various vision tasks are
tested on it. The results show that this VPU achieves a high efficiency for both CNN processing and
ISP and shows a significant reduction in energy consumption for vision tasks consisting of CNNs
and ISP. For various CNN tasks, it maintains an average multiply accumulator utilization of over
94% and achieves a performance of 163.2 GOPS with a frequency of 200 MHz.
Keywords:
vision processing unit; neural network processing unit; image signal processing unit;
image recognition
1. Introduction
Vision processing chips have proven to be highly efficient for computer vision tasks
by integrating the image sensor and vision processing unit (VPU) together in the recent
works [
1
–
3
]. Most of them utilize a Single-Instruction-Multiple-Data (SIMD) array of
processing elements (PE) connected with the sensor directly. Consequently, they can
eliminate the pixels transmission bottleneck and execute vision tasks in a parallel way.
The vision tasks mainly consist of image signal processing (ISP) algorithms and recog-
nition algorithms [
1
], as illustrated in Figure 1. All the algorithms are performed on the
PE array in the VPU. On the conventional vision chips, recognition algorithms includ-
ing Speed-up Robust Features (SURF) [
4
], Scale-Invariant Feature Transform (SIFT) [
5
]
and Features from Accelerated Segment Test (FAST) [
6
] are usually applied. Recently,
the artificial neural networks have shown great performance on the computer vision
tasks
[7–10]. Therefore, works [1,11]
proposed the VPUs that try to exploit the conven-
tional PE array for self-organizing map (SOM) neural networks. However, these conven-
tional architectures are not efficient for modern neural networks. They do not contain
the multiply accumulators (MAC), which are essential to accelerate the neural network
processing
[12–14]
. For instance, the convolutional neural networks (CNNs) are very im-
Electronics 2021, 10, 2989. https://doi.org/10.3390/electronics10232989 https://www.mdpi.com/journal/electronics