基于脉动阵列加速器的细粒度建模方法

ID:38943

大小:0.67 MB

页数:16页

时间:2023-03-14

金币:2

上传者:战必胜
Citation: Li, Y.; Wen, M.; Fei, J.; Shen,
J.; Cao, Y. A Fine-Grained Modeling
Approach for Systolic Array-Based
Accelerator. Electronics 2022, 11, 2928.
https://doi.org/10.3390/
electronics11182928
Academic Editors: Phivos Mylonas,
Katia Lida Kermanidis and Manolis
Maragoudakis
Received: 23 August 2022
Accepted: 13 September 2022
Published: 15 September 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
A Fine-Grained Modeling Approach for Systolic
Array-Based Accelerator
Yuhang Li , Mei Wen *, Jiawei Fei, Junzhong Shen and Yasong Cao
School of Computer Science, National University of Defense Technology, Changsha 410073, China
* Correspondence: meiwen@nudt.edu.cn
Abstract:
The systolic array provides extremely high efficiency for running matrix multiplication
and is one of the mainstream architectures of today’s deep learning accelerators. In order to develop
efficient accelerators, people usually employ simulators to make design trade-offs. However, current
simulators suffer from coarse-grained modeling methods and ideal assumptions, which limits their
ability to describe structural characteristics of systolic arrays. In addition, they do not support the
exploration of microarchitecture. This paper presents FG-SIM, a fine-grained modeling approach
for evaluating systolic array accelerators by using an event-driven method. FG-SIM can obtain
accurate results and provide the best mapping scheme for different workloads due to its fine-grained
modeling technique and deny of ideal assumption. Experimental results show that FG-SIM plays a
significant role in design trade-offs and outperforms state-of-the-art simulators, with an accuracy of
more than 95%.
Keywords: modeling; systolic array; accelerator
1. Introduction
Deep neural networks (DNNs) have come to play an increasingly significant role
in image recognition, speech recognition, text classification and other fields [
1
3
]. As
application requirements have continued to increase, complex network models and a
large numbers of parameters have resulted in higher computation time and declining
performance. Deploying hardware accelerators is a common way for people to solve such
problems. These accelerators [
4
10
] use different dataflows and hardware architectures
to accelerate the computation in different ways. Existing work shows that dataflow in
particular has a substantial impact on data reuse and hardware utilization [7,11,12].
With the goal of finding efficient dataflows and hardware architectures, many previous
works [
7
,
13
18
] have attempted to explore the design space with varying degrees of success.
The systolic array has also become one of the mainstream DNN accelerator architectures
due to its unique structural characteristics. However, in order to improve the generality
of the models, these works [
16
18
] are modeled in an overly abstract way, making it
difficult to fully describe the specific implementation details of the hardware, and the
simulation results obtained are far from the real situation. In addition, these models [
15
18
]
are established based on certain assumptions, such as a lack of correlation between data,
sufficient data supply during computations, etc. These assumptions are often difficult to
implement in practice, and the resulting problems are difficult to capture in the model. As
a result, the pauses caused by these problems can also lead to inaccurate simulation results.
On the other hand, when the dataflow and hardware structure have been pretedermined,
the hardware will also employ different scheduling operations and data segmentation
during computations, which we refer to as different mappings. Since these different
mappings can have a significant impact on performance, it is also very important to quickly
search for and identify the best mapping for different workloads under given conditions.
Much of the previous work [
17
,
18
] has not considered this issue and thus has not been able
Electronics 2022, 11, 2928. https://doi.org/10.3390/electronics11182928 https://www.mdpi.com/journal/electronics
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭