Seneors报告使用Shapley集成Boosting和Bagging方法的零日恶意软件检测和有效恶意软件分析-2022年

ID：28626

阅读量：0

大小：3.01 MB

页数：23页

时间：2023-01-07

金币：10

上传者：战必胜



 

Citation: Kumar, R.; Subbiah, G.

Zero-Day Malware Detection and

Effective Malware Analysis Using

Shapley Ensemble Boosting and

Bagging Approach. Sensors 2022, 22,

2798. https://doi.org/10.3390/

s22072798

Academic Editors: Alexios Mylonas

and Nikolaos Pitropakis

Received: 15 February 2022

Accepted: 28 March 2022

Published: 6 April 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

sensors

Article

Zero-Day Malware Detection and Effective Malware Analysis

Using Shapley Ensemble Boosting and Bagging Approach

Rajesh Kumar * and Geetha Subbiah

School of Computer Science and Engineering, Vellore Institute of Technology, Chennai Campus,

Chennai 600127, Tamil Nadu, India; geetha.s@vit.ac.in

* Correspondence: rajesh.kumar@vit.ac.in; Tel.: +91-909-295-2221

Abstract:

Software products from all vendors have vulnerabilities that can cause a security concern.

Malware is used as a prime exploitation tool to exploit these vulnerabilities. Machine learning (ML)

methods are efﬁcient in detecting malware and are state-of-art. The effectiveness of ML models can be

augmented by reducing false negatives and false positives. In this paper, the performance of bagging

and boosting machine learning models is enhanced by reducing misclassiﬁcation. Shapley values

of features are a true representation of the amount of contribution of features and help detect top

features for any prediction by the ML model. Shapley values are transformed to probability scale

to correlate with a prediction value of ML model and to detect top features for any prediction by a

trained ML model. The trend of top features derived from false negative and false positive predictions

by a trained ML model can be used for making inductive rules. In this work, the best performing

ML model in bagging and boosting is determined by the accuracy and confusion matrix on three

malware datasets from three different periods. The best performing ML model is used to make

effective inductive rules using waterfall plots based on the probability scale of features. This work

helps improve cyber security scenarios by effective detection of false-negative zero-day malware.

Keywords:

machine learning; computer security; artificial intelligence; boosting; bagging; cyber

security; zero-day vulnerability; zero-day malware detection; Shapley value

1. Introduction

Malware are meant to exploit the vulnerability and exposure of various software

product such as applications, Operating Systems (OS), drivers, etc. The popularity of OS

and applications make them a hot target for malware attacks. The ten top vendors from

the top 50 software vendors that have vulnerabilities in their various software products

are listed in Table 1, and the ten top products from ﬁfty top software products are listed in

Table 2 from a common vulnerability and exposure website. The speed of the generation of

malware is very high these days. AlienVault—Open Threat Exchange is a crowd-sourced

computer-security platform. It shares more than 19 million potential threats daily among

more than 80,000 participants from 140 countries. Malware authors have polymorphic

and metamorphic engines for generating new malware at high speed. These malware

are exploited to convert these threats into attacks. The polymorphic and metamorphic

engines generate dissimilar malware variants for zero-day attacks. The polymorphic and

metamorphic engines modify some parts of the source code of existing malware to produce

a new malware variant. For instance, reassignment of the registers such as replacing [PUSH

eax] with [PUSH ebx] and related changes for POP instructions replace code between

registers by exchanging register names. The program behavior is the same as before. These

methods change the hash values and signatures for the malware and it is not detectable by

anti-virus software, which depends on signatures or hash values.

Sensors 2022, 22, 2798. https://doi.org/10.3390/s22072798 https://www.mdpi.com/journal/sensors

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 23



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

Seneors报告使用Shapley集成Boosting和Bagging方法的零日恶意软件检测和有效恶意软件分析-2022年

最近更新

大家都在看

相关文章

相关标签

Seneors报告 使用Shapley集成Boosting和Bagging方法的零日恶意软件检测和有效恶意软件分析-2022年

最近更新

大家都在看

相关文章

相关标签

Seneors报告使用Shapley集成Boosting和Bagging方法的零日恶意软件检测和有效恶意软件分析-2022年