基于搜索的优化器在交叉项目中的最优特征选择

ID：38736

阅读量：0

大小：2.18 MB

页数：20页

时间：2023-03-14

金币：2

上传者：战必胜

Citation: Faiz, R.b.; Shaheen, S.;

Sharaf, M.; Rauf, H.T. Optimal

Feature Selection through

Search-Based Optimizer in Cross

Project. Electronics 2023, 12, 514.

https://doi.org/10.3390/

electronics12030514

Academic Editor: George A.

Tsihrintzis

Received: 23 December 2022

Revised: 6 January 2023

Accepted: 9 January 2023

Published: 19 January 2023

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

electronics

Article

Optimal Feature Selection through Search-Based Optimizer in

Cross Project

Rizwan bin Faiz

, Saman Shaheen

, Mohamed Sharaf

and Haﬁz Tayyab Rauf

Faculty of Computing, Riphah International University, I-14 Campus Islamabad, Islamabad 46000, Pakistan

Industrial Engineering Department, College of Engineering, King Saud University,

P.O. Box 800, Riyadh 11421, Saudi Arabia

Centre for Smart Systems, AI and Cybersecurity, Staffordshire University, Stoke-on-Trent ST4 2DE, UK

* Correspondence: haﬁztayyabrauf093@gmail.com

Abstract:

Cross project defect prediction (CPDP) is a key method for estimating defect-prone modules

of software products. CPDP is a tempting approach since it provides information about predicted

defects for those projects in which data are insufﬁcient. Recent studies speciﬁcally include instructions

on how to pick training data from large datasets using feature selection (FS) process which contributes

the most in the end results. The classiﬁer helps classify the picked-up dataset in speciﬁed classes

in order to predict the defective and non-defective classes. The aim of our research is to select the

optimal set of features from multi-class data through a search-based optimizer for CPDP. We used the

explanatory research type and quantitative approach for our experimentation. We have F1 measure

as our dependent variable while as independent variables we have KNN ﬁlter, ANN ﬁlter, random

forest ensemble (RFE) model, genetic algorithm (GA), and classiﬁers as manipulative independent

variables. Our experiment follows 1 factor 1 treatment (1F1T) for RQ1 whereas for RQ2, RQ3, and

RQ4, there are 1 factor 2 treatments (1F2T) design. We ﬁrst carried out the explanatory data analysis

(EDA) to know the nature of our dataset. Then we pre-processed our data by removing and solving

the issues identiﬁed. During data preprocessing, we analyze that we have multi-class data; therefore,

we ﬁrst rank features and select multiple feature sets using the info gain algorithm to get maximum

variation in features for multi-class dataset. To remove noise, we use ANN-ﬁlter and get signiﬁcant

results more than 40% to 60% compared to NN ﬁlter with base paper (all, ckloc, IG). Then we applied

search-based optimizer i.e., random forest ensemble (RFE) to get the best features set for a software

prediction model and we get 30% to 50% signiﬁcant results compared with genetic instance selection

(GIS). Then we used a classiﬁer to predict defects for CPDP. We compare the results of the classiﬁer

with base paper classiﬁer using F1-measure and we get almost 35% more than base paper. We validate

the experiment using Wilcoxon and Cohen’s d test.

Keywords:

search-based optimizer; cross project defect prediction; artificial neural network information-

gain; ANN ﬁlter; K-nearest neighbor (KNN ﬁlter); random forest ensemble (RFE)

1. Introduction

For prediction of software, software defect proneness (SDP) is a study area that

provides effective techniques. From previous versions of the same project, defective

data can be used to detect fault proneness. At early stages of software development,

prediction of defects in software subsystems (modules) plays a vital role in decreasing the

development costs and time. It eradicates the excessive efforts to ﬁnd defects from the

software modules in later stages of the software development. Preceding studies in this

research area consider the within project defect prediction (WPDP) in which the same data

are used for training and predicting defects and are cross-validated [

]. However, according

to [

], WPDP approach is only valid when there is a large dataset with less granularity. Yet,

such approaches do not hold in training data speciﬁcally for inactive software projects.

Electronics 2023, 12, 514. https://doi.org/10.3390/electronics12030514 https://www.mdpi.com/journal/electronics

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 20



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

基于搜索的优化器在交叉项目中的最优特征选择

最近更新

大家都在看

相关文章

相关标签