Seneors报告 面向排序的跨项目缺陷预测训练数据选择方法的实证研究-2021年

VIP文档

ID:28604

大小:1.21 MB

页数:18页

时间:2023-01-07

金币:10

上传者:战必胜
sensors
Article
An Empirical Study of Training Data Selection Methods for
Ranking-Oriented Cross-Project Defect Prediction
Haoyu Luo
1
, Heng Dai
2
, Weiqiang Peng
3
, Wenhua Hu
4,
* and Fuyang Li
4,
*

 
Citation: Luo, H.; Dai, H.; Peng, W.;
Hu, W.; Li, F. An Empirical Study of
Training Data Selection Methods for
Ranking-Oriented Cross-Project
Defect Prediction. Sensors 2021, 21,
7535. https://doi.org/10.3390/
s21227535
Academic Editors: Kim Phuc Tran,
Athanasios Rakitzis and
Khanh T. P. Nguyen
Received: 26 October 2021
Accepted: 10 November 2021
Published: 12 November 2021
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
School of Computer Science, South China Normal University, Guangzhou 510631, China;
hluo@m.scnu.edu.cn
2
School of Mechanical and Electrical Engineering, Wuhan Qingchuan University, Wuhan 430204, China;
daiheng726@163.com
3
School of Computer Science, Wuhan University, Wuhan 430072, China; pengweiqiang@whu.edu.cn
4
School of Computer Science and Artificial Intelligence, Wuhan University of Technology,
Wuhan 430070, China
* Correspondence: whu10@whut.edu.cn (W.H.); fyli@whut.edu.cn (F.L.); Tel.: +86-158-2735-4612 (W.H.);
+86-27-8721-6780 (F.L.)
Abstract:
Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software mod-
ules of a new target industrial project based on the predicted defect number or density, has been
suggested in the literature. A major concern of ROCPDP is the distribution difference between the
source project (aka. within-project) data and target project (aka. cross-project) data, which evidently
degrades prediction performance. To investigate the impacts of training data selection methods on
the performances of ROCPDP models, we examined the practical effects of nine training data selec-
tion methods, including a global filter, which does not filter out any cross-project data. Additionally,
the prediction performances of ROCPDP models trained on the filtered cross-project data using the
training data selection methods were compared with those of ranking-oriented within-project defect
prediction (ROWPDP) models trained on sufficient and limited within-project data. Eleven avail-
able defect datasets from the industrial projects were considered and evaluated using two ranking
performance measures, i.e., FPA and Norm(Popt). The results showed no statistically significant
differences among these nine training data selection methods in terms of FPA and Norm(Popt). The
performances of ROCPDP models trained on filtered cross-project data were not comparable with
those of ROWPDP models trained on sufficient historical within-project data. However, ROCPDP
models trained on filtered cross-project data achieved better performance values than ROWPDP
models trained on limited historical within-project data. Therefore, we recommended that soft-
ware quality teams exploit other project datasets to perform ROCPDP when there is no or limited
within-project data.
Keywords: fault prediction; machine learning; data selection
1. Introduction
Software defect prediction (SDP), also known as software fault prediction, is a research
hotspot, which has drawn lots of attention from both industry and academia [1,2]. Defect
prediction recognizes the appearance of defects in the system or industrial software, which
provides support to find the category, location, and scale of defects [
3
7
]. It has long been
recognized as one of the important aspects of improving the reliability of industrial system
software [810].
With the development of artificial intelligence algorithms, the reliability
of automatic defect prediction is ever-increasing. The general method of software defect
prediction models is to learn a classification model from the historical datasets via the
machine learning algorithms, and then predict whether new software modules contain
bugs [
11
]. The accurate prediction results can contribute to the allocation of reasonable
testing resources by focusing on those predicted defect-prone modules [12,13].
Sensors 2021, 21, 7535. https://doi.org/10.3390/s21227535 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭