基于Bi-LSTM的神经源代码摘要

ID:38731

阅读量:0

大小:1.15 MB

页数:12页

时间:2023-03-14

金币:2

上传者:战必胜
Citation: Aljumah, S.; Berriche, L.
Bi-LSTM-Based Neural Source Code
Summarization. Appl. Sci. 2022, 12,
12587. https://doi.org/10.3390/
app122412587
Academic Editors: Robertas
Damaševiˇcius, Sanjay Misra and
Bharti Suri
Received: 2 November 2022
Accepted: 6 December 2022
Published: 8 December 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
Bi-LSTM-Based Neural Source Code Summarization
Sarah Aljumah and Lamia Berriche *
College of Computer & Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia
* Correspondence: lberriche@psu.edu.sa
Featured Application: Code comment generation.
Abstract:
Code summarization is a task that is often employed by software developers for fixing
code or reusing code. Software documentation is essential when it comes to software maintenance.
The highest cost in software development goes to maintenance because of the difficulty of code
modification. To help in reducing the cost and time spent on software development and maintenance,
we introduce an automated comment summarization and commenting technique using state-of-
the-art techniques in summarization. We use deep neural networks, specifically bidirectional long
short-term memory (Bi-LSTM), combined with an attention model to enhance performance. In this
study, we propose two different scenarios: one that uses the code text and the structure of the code
represented in an abstract syntax tree (AST) and another that uses only code text. We propose two
encoder-based models for the first scenario that encodes the code text and the AST independently.
Previous works have used different techniques in deep neural networks to generate comments. This
study’s proposed methodologies scored higher than previous works based on the gated recurrent
unit encoder. We conducted our experiment on a dataset of 2.1 million pairs of Java methods and
comments. Additionally, we showed that the code structure is beneficial for methods’ signatures
featuring unclear words.
Keywords:
software engineering; neural network; code summarization; software development;
software maintenance; deep learning; big data
1. Introduction
Lack of documentation causes cost growth and an extension in a project’s schedule.
Manual documentation takes extra effort, and it is hard to maintain, causing frustration
for developers when making new changes to their code. The automatic generation of
comments saves time for developers, and it is effective in terms of simulating human com-
ments. Long short-term memory (LSTM) has shown its effectiveness in text summarization
and translation.
In software development and maintenance, software details, such as dependencies,
internal structures, integrations, and configurations, must be properly documented in the
same source code files. Code comments are essential to software and can help in decreasing
the time of software development, better understanding, easy code alteration, better bug
detection, and, most importantly, can allow for software reuse. Code comments are one
of the most important artifacts to understand for maintenance. A study in [
1
] showed
that artifacts such as literature and architectural models are not as important as code
comments. An earlier study has shown that, on average, developers spend 60% of their
time on program understanding [
2
]. A commented code is easier to understand than an
uncommented code. Therefore, developers save much time by reading the code description
to find key information needed for a code change or maintenance. The absence of code
comments can greatly decrease software quality and maintainability.
However, because of the dynamic nature of software projects and project manage-
ment’s tight schedules, source code documentation is usually ignored. Code documentation
Appl. Sci. 2022, 12, 12587. https://doi.org/10.3390/app122412587 https://www.mdpi.com/journal/applsci
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭