基于搜索的优化器在交叉项目中的最优特征选择

ID:38736

大小:2.18 MB

页数:20页

时间:2023-03-14

金币:2

上传者:战必胜
Citation: Faiz, R.b.; Shaheen, S.;
Sharaf, M.; Rauf, H.T. Optimal
Feature Selection through
Search-Based Optimizer in Cross
Project. Electronics 2023, 12, 514.
https://doi.org/10.3390/
electronics12030514
Academic Editor: George A.
Tsihrintzis
Received: 23 December 2022
Revised: 6 January 2023
Accepted: 9 January 2023
Published: 19 January 2023
Copyright: © 2023 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
electronics
Article
Optimal Feature Selection through Search-Based Optimizer in
Cross Project
Rizwan bin Faiz
1
, Saman Shaheen
1
, Mohamed Sharaf
2
and Hafiz Tayyab Rauf
3,
*
1
Faculty of Computing, Riphah International University, I-14 Campus Islamabad, Islamabad 46000, Pakistan
2
Industrial Engineering Department, College of Engineering, King Saud University,
P.O. Box 800, Riyadh 11421, Saudi Arabia
3
Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent ST4 2DE, UK
* Correspondence: hafiztayyabrauf093@gmail.com
Abstract:
Cross project defect prediction (CPDP) is a key method for estimating defect-prone modules
of software products. CPDP is a tempting approach since it provides information about predicted
defects for those projects in which data are insufficient. Recent studies specifically include instructions
on how to pick training data from large datasets using feature selection (FS) process which contributes
the most in the end results. The classifier helps classify the picked-up dataset in specified classes
in order to predict the defective and non-defective classes. The aim of our research is to select the
optimal set of features from multi-class data through a search-based optimizer for CPDP. We used the
explanatory research type and quantitative approach for our experimentation. We have F1 measure
as our dependent variable while as independent variables we have KNN filter, ANN filter, random
forest ensemble (RFE) model, genetic algorithm (GA), and classifiers as manipulative independent
variables. Our experiment follows 1 factor 1 treatment (1F1T) for RQ1 whereas for RQ2, RQ3, and
RQ4, there are 1 factor 2 treatments (1F2T) design. We first carried out the explanatory data analysis
(EDA) to know the nature of our dataset. Then we pre-processed our data by removing and solving
the issues identified. During data preprocessing, we analyze that we have multi-class data; therefore,
we first rank features and select multiple feature sets using the info gain algorithm to get maximum
variation in features for multi-class dataset. To remove noise, we use ANN-filter and get significant
results more than 40% to 60% compared to NN filter with base paper (all, ckloc, IG). Then we applied
search-based optimizer i.e., random forest ensemble (RFE) to get the best features set for a software
prediction model and we get 30% to 50% significant results compared with genetic instance selection
(GIS). Then we used a classifier to predict defects for CPDP. We compare the results of the classifier
with base paper classifier using F1-measure and we get almost 35% more than base paper. We validate
the experiment using Wilcoxon and Cohen’s d test.
Keywords:
search-based optimizer; cross project defect prediction; artificial neural network information-
gain; ANN filter; K-nearest neighbor (KNN filter); random forest ensemble (RFE)
1. Introduction
For prediction of software, software defect proneness (SDP) is a study area that
provides effective techniques. From previous versions of the same project, defective
data can be used to detect fault proneness. At early stages of software development,
prediction of defects in software subsystems (modules) plays a vital role in decreasing the
development costs and time. It eradicates the excessive efforts to find defects from the
software modules in later stages of the software development. Preceding studies in this
research area consider the within project defect prediction (WPDP) in which the same data
are used for training and predicting defects and are cross-validated [
1
]. However, according
to [
2
], WPDP approach is only valid when there is a large dataset with less granularity. Yet,
such approaches do not hold in training data specifically for inactive software projects.
Electronics 2023, 12, 514. https://doi.org/10.3390/electronics12030514 https://www.mdpi.com/journal/electronics
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭