Seneors报告面向排序的跨项目缺陷预测训练数据选择方法的实证研究-2021年

VIP文档

ID：28604

阅读量：0

大小：1.21 MB

页数：18页

时间：2023-01-07

金币：10

上传者：战必胜

sensors

Article

An Empirical Study of Training Data Selection Methods for

Ranking-Oriented Cross-Project Defect Prediction

Haoyu Luo

, Heng Dai

, Weiqiang Peng

, Wenhua Hu

* and Fuyang Li



 

Citation: Luo, H.; Dai, H.; Peng, W.;

Hu, W.; Li, F. An Empirical Study of

Training Data Selection Methods for

Ranking-Oriented Cross-Project

Defect Prediction. Sensors 2021, 21,

7535. https://doi.org/10.3390/

s21227535

Academic Editors: Kim Phuc Tran,

Athanasios Rakitzis and

Khanh T. P. Nguyen

Received: 26 October 2021

Accepted: 10 November 2021

Published: 12 November 2021

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

School of Computer Science, South China Normal University, Guangzhou 510631, China;

hluo@m.scnu.edu.cn

School of Mechanical and Electrical Engineering, Wuhan Qingchuan University, Wuhan 430204, China;

daiheng726@163.com

School of Computer Science, Wuhan University, Wuhan 430072, China; pengweiqiang@whu.edu.cn

School of Computer Science and Artiﬁcial Intelligence, Wuhan University of Technology,

Wuhan 430070, China

* Correspondence: whu10@whut.edu.cn (W.H.); fyli@whut.edu.cn (F.L.); Tel.: +86-158-2735-4612 (W.H.);

+86-27-8721-6780 (F.L.)

Abstract:

Ranking-oriented cross-project defect prediction (ROCPDP), which ranks software mod-

ules of a new target industrial project based on the predicted defect number or density, has been

suggested in the literature. A major concern of ROCPDP is the distribution difference between the

source project (aka. within-project) data and target project (aka. cross-project) data, which evidently

degrades prediction performance. To investigate the impacts of training data selection methods on

the performances of ROCPDP models, we examined the practical effects of nine training data selec-

tion methods, including a global ﬁlter, which does not ﬁlter out any cross-project data. Additionally,

the prediction performances of ROCPDP models trained on the ﬁltered cross-project data using the

training data selection methods were compared with those of ranking-oriented within-project defect

prediction (ROWPDP) models trained on sufﬁcient and limited within-project data. Eleven avail-

able defect datasets from the industrial projects were considered and evaluated using two ranking

performance measures, i.e., FPA and Norm(Popt). The results showed no statistically signiﬁcant

differences among these nine training data selection methods in terms of FPA and Norm(Popt). The

performances of ROCPDP models trained on ﬁltered cross-project data were not comparable with

those of ROWPDP models trained on sufﬁcient historical within-project data. However, ROCPDP

models trained on ﬁltered cross-project data achieved better performance values than ROWPDP

models trained on limited historical within-project data. Therefore, we recommended that soft-

ware quality teams exploit other project datasets to perform ROCPDP when there is no or limited

within-project data.

Keywords: fault prediction; machine learning; data selection

1. Introduction

Software defect prediction (SDP), also known as software fault prediction, is a research

hotspot, which has drawn lots of attention from both industry and academia [1,2]. Defect

prediction recognizes the appearance of defects in the system or industrial software, which

provides support to ﬁnd the category, location, and scale of defects [

–

]. It has long been

recognized as one of the important aspects of improving the reliability of industrial system

software [8–10].

With the development of artiﬁcial intelligence algorithms, the reliability

of automatic defect prediction is ever-increasing. The general method of software defect

prediction models is to learn a classiﬁcation model from the historical datasets via the

machine learning algorithms, and then predict whether new software modules contain

bugs [

]. The accurate prediction results can contribute to the allocation of reasonable

testing resources by focusing on those predicted defect-prone modules [12,13].

Sensors 2021, 21, 7535. https://doi.org/10.3390/s21227535 https://www.mdpi.com/journal/sensors

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 18



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

Seneors报告面向排序的跨项目缺陷预测训练数据选择方法的实证研究-2021年

最近更新

大家都在看

相关文章

相关标签

Seneors报告 面向排序的跨项目缺陷预测训练数据选择方法的实证研究-2021年

最近更新

大家都在看

相关文章

相关标签

Seneors报告面向排序的跨项目缺陷预测训练数据选择方法的实证研究-2021年