神经网络训练处理器浮点算法的优化结构-2022年

ID:37305

大小:3.52 MB

页数:16页

时间:2023-03-03

金币:10

上传者:战必胜

 
Citation: Junaid, M.; Arslan, S.; Lee,
T.; Kim, H. Optimal Architecture of
Floating-Point Arithmetic for Neural
Network Training Processors. Sensors
2022, 22, 1230. https://doi.org/
10.3390/s22031230
Academic Editors: Yangquan Chen,
Subhas Mukhopadhyay,
Nunzio Cennamo, M. Jamal Deen,
Junseop Lee and Simone Morais
Received: 24 December 2021
Accepted: 3 February 2022
Published: 6 February 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
Optimal Architecture of Floating-Point Arithmetic for Neural
Network Training Processors
Muhammad Junaid
1
, Saad Arslan
2
, TaeGeon Lee
1
and HyungWon Kim
1,
*
1
Department of Electronics, College of Electrical and Computer Engineering, Chungbuk National University,
Cheongju 28644, Korea; junaid@chungbuk.ac.kr (M.J.); tglee2@chungbuk.ac.kr (T.L.)
2
Department of Electrical and Computer Engineering, COMSATS University Islamabad, Park Road, Tarlai
Kalan, Islamabad 45550, Pakistan; saad.arslan@comsats.edu.pk
* Correspondence: hwkim@chungbuk.ac.kr
Abstract:
The convergence of artificial intelligence (AI) is one of the critical technologies in the recent
fourth industrial revolution. The AIoT (Artificial Intelligence Internet of Things) is expected to be a
solution that aids rapid and secure data processing. While the success of AIoT demanded low-power
neural network processors, most of the recent research has been focused on accelerator designs
only for inference. The growing interest in self-supervised and semi-supervised learning now calls
for processors offloading the training process in addition to the inference process. Incorporating
training with high accuracy goals requires the use of floating-point operators. The higher precision
floating-point arithmetic architectures in neural networks tend to consume a large area and energy.
Consequently, an energy-efficient/compact accelerator is required. The proposed architecture incor-
porates training in 32 bits, 24 bits, 16 bits, and mixed precisions to find the optimal floating-point
format for low power and smaller-sized edge device. The proposed accelerator engines have been
verified on FPGA for both inference and training of the MNIST image dataset. The combination
of 24-bit custom FP format with 16-bit Brain FP has achieved an accuracy of more than 93%. ASIC
implementation of this optimized mixed-precision accelerator using TSMC 65nm reveals an active
area of 1.036
×
1.036 mm
2
and energy consumption of 4.445
µ
J per training of one image. Compared
with 32-bit architecture, the size and the energy are reduced by 4.7 and 3.91 times, respectively. There-
fore, the CNN structure using floating-point numbers with an optimized data path will significantly
contribute to developing the AIoT field that requires a small area, low energy, and high accuracy.
Keywords: floating-points; IEEE 754; convolutional neural network (CNN); MNIST dataset
1. Introduction
The Internet of Things (IoT) is a core technology leading the fourth industrial revolu-
tion through the convergence and integration of various advanced technologies. Recently,
the convergence of artificial intelligence (AI) is expected to be a solution that helps data
processing in IoT quickly and safely. The development of AIoT (Artificial Intelligence
Internet of Things), a combination of AI and IoT, is expected to improve and broaden IoT
products’ performance [13].
AIoT is the latest research topic among AI semiconductors [
4
,
5
]. Before the AIoT
topic, a wide range of research has been conducted in implementing AI, that is, a neural
network that mimics human neurons. As research on AIoT is advancing, the challenges
of resource-constrained IoT devices are also emerging. A survey on such challenges is
being conducted by [
6
], where the potential solutions for challenges in communication
overhead, convergence guarantee and energy reduction are summarized. Most of the
studies on accelerators of the neural network have focused on the architecture and circuit
structure in the forward direction that determines the accuracy of input data, such as
image data [
7
,
8
]. However, to mimic the neural network of humans or animals as much as
Sensors 2022, 22, 1230. https://doi.org/10.3390/s22031230 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭