基于脉动阵列加速器的细粒度建模方法

ID：38943

阅读量：1

大小：0.67 MB

页数：16页

时间：2023-03-14

金币：2

上传者：战必胜

Citation: Li, Y.; Wen, M.; Fei, J.; Shen,

J.; Cao, Y. A Fine-Grained Modeling

Approach for Systolic Array-Based

Accelerator. Electronics 2022, 11, 2928.

https://doi.org/10.3390/

electronics11182928

Academic Editors: Phivos Mylonas,

Katia Lida Kermanidis and Manolis

Maragoudakis

Received: 23 August 2022

Accepted: 13 September 2022

Published: 15 September 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

electronics

Article

A Fine-Grained Modeling Approach for Systolic

Array-Based Accelerator

Yuhang Li , Mei Wen *, Jiawei Fei, Junzhong Shen and Yasong Cao

School of Computer Science, National University of Defense Technology, Changsha 410073, China

* Correspondence: meiwen@nudt.edu.cn

Abstract:

The systolic array provides extremely high efﬁciency for running matrix multiplication

and is one of the mainstream architectures of today’s deep learning accelerators. In order to develop

efﬁcient accelerators, people usually employ simulators to make design trade-offs. However, current

simulators suffer from coarse-grained modeling methods and ideal assumptions, which limits their

ability to describe structural characteristics of systolic arrays. In addition, they do not support the

exploration of microarchitecture. This paper presents FG-SIM, a ﬁne-grained modeling approach

for evaluating systolic array accelerators by using an event-driven method. FG-SIM can obtain

accurate results and provide the best mapping scheme for different workloads due to its ﬁne-grained

modeling technique and deny of ideal assumption. Experimental results show that FG-SIM plays a

signiﬁcant role in design trade-offs and outperforms state-of-the-art simulators, with an accuracy of

more than 95%.

Keywords: modeling; systolic array; accelerator

1. Introduction

Deep neural networks (DNNs) have come to play an increasingly signiﬁcant role

in image recognition, speech recognition, text classiﬁcation and other ﬁelds [

–

]. As

application requirements have continued to increase, complex network models and a

large numbers of parameters have resulted in higher computation time and declining

performance. Deploying hardware accelerators is a common way for people to solve such

problems. These accelerators [

–

] use different dataﬂows and hardware architectures

to accelerate the computation in different ways. Existing work shows that dataﬂow in

particular has a substantial impact on data reuse and hardware utilization [7,11,12].

With the goal of ﬁnding efﬁcient dataﬂows and hardware architectures, many previous

works [

–

] have attempted to explore the design space with varying degrees of success.

The systolic array has also become one of the mainstream DNN accelerator architectures

due to its unique structural characteristics. However, in order to improve the generality

of the models, these works [

–

] are modeled in an overly abstract way, making it

difﬁcult to fully describe the speciﬁc implementation details of the hardware, and the

simulation results obtained are far from the real situation. In addition, these models [

–

]

are established based on certain assumptions, such as a lack of correlation between data,

sufﬁcient data supply during computations, etc. These assumptions are often difﬁcult to

implement in practice, and the resulting problems are difﬁcult to capture in the model. As

a result, the pauses caused by these problems can also lead to inaccurate simulation results.

On the other hand, when the dataﬂow and hardware structure have been pretedermined,

the hardware will also employ different scheduling operations and data segmentation

during computations, which we refer to as different mappings. Since these different

mappings can have a signiﬁcant impact on performance, it is also very important to quickly

search for and identify the best mapping for different workloads under given conditions.

Much of the previous work [

] has not considered this issue and thus has not been able

Electronics 2022, 11, 2928. https://doi.org/10.3390/electronics11182928 https://www.mdpi.com/journal/electronics

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 16



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

基于脉动阵列加速器的细粒度建模方法

最近更新

大家都在看

相关文章

相关标签