An Analysis of Android Malware Classification Services
Mohammed Rashed
* and Guillermo Suarez-Tangil
Citation: Rashed, M.;
Suárez-Tangil, G. An Analysis of
Android Malware Classification
Services. Sensors 2021, 21, 5671.
Academic Editors: Alexios Mylonas
and Nikolaos Pitropakis
Received: 9 July 2021
Accepted: 17 August 2021
Published: 23 August 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
Computer Science and Engineering Department, Universidad Carlos III de Madrid, Avda. de la Universidad
30, 28911 Leganés, Spain
IMDEA Networks Institute, Avda. del Mar Mediterraneo, 22, 28918 Leganes, Spain;
* Correspondence: mrashed@inf.uc3m.es
† Former address: Department of Informatics, King’s College London, Bush House, 30 Aldwych,
London WC2B 4BG, UK.
The increasing number of Android malware forced antivirus (AV) companies to rely
on automated classification techniques to determine the family and class of suspicious samples.
The research community relies heavily on such labels to carry out prevalence studies of the threat
ecosystem and to build datasets that are used to validate and benchmark novel detection and
classification methods. In this work, we carry out an extensive study of the Android malware
ecosystem by surveying white papers and reports from 6 key players in the industry, as well as
81 papers from 8 top security conferences, to understand how malware datasets are used by both.
We, then, explore the limitations associated with the use of available malware classification services,
namely VirusTotal (VT) engines, for determining the family of an Android sample. Using a dataset
of 2.47 M Android malware samples, we find that the detection coverage of VT’s AVs is generally
very low, that the percentage of samples flagged by any 2 AV engines does not go beyond 52%,
and that common families between any pair of AV engines is at best 29%. We rely on clustering to
determine the extent to which different AV engine pairs agree upon which samples belong to the
same family (regardless of the actual family name) and find that there are discrepancies that can
introduce noise in automatic label unification schemes. We also observe the usage of generic labels
and inconsistencies within the labels of top AV engines, suggesting that their efforts are directed
towards accurate detection rather than classification. Our results contribute to a better understanding
of the limitations of using Android malware family labels as supplied by common AV engines.
Keywords: Android; malware; classification; family; VirusTotal; antivirus; clustering; labels
1. Introduction
With more than 2.8B active users worldwide, Android is now the most used OS
on mobile devices [
]. In a similar manner, Android has become the top target OS for
smartphone malware. In the early days of the platform, between October 2010 and Octo-
ber 2012, Kaspersky reported an increase of incoming Android malware from less than
1 K to more than 40 K
. By March 2020, the influx of new malware reached 480 K
Thus, since the beginnings of the platform, Antivirus companies (AVs hereafter) developed
threat intelligence solutions to protect Android users from malware [
]. Because of
the limited number of detected malware samples early on, human analysts were able
to study samples, identify their behavior, and label them following an internal scheme
of the AV company, most likely including the platform, type, and family of the sample
Section 5.2
). However, such a surge made it inevitable for AVs to use automation
techniques in both detection and family classification because of the impossibility of man-
ually handling the influx of samples arriving to AVs [
]. Gheorghescu, a researcher at
Microsoft’s Security unit (as indicated in the affiliation), introduced his automatic family
classification system and indicated, in 2005, that his technique was not generally adopted
Sensors 2021, 21, 5671. https://doi.org/10.3390/s21165671 https://www.mdpi.com/journal/sensors