Article
Beware the Black-Box: On the Robustness of Recent Defenses
to Adversarial Examples
Kaleel Mahmood
1,
* , Deniz Gurevin
2
, Marten van Dijk
3
and Phuoung Ha Nguyen
4
Citation: Mahmood, K.; Gurevin, D.;
van Dijk, M.; Nguyen, P.H. Beware
the Black-Box: On the Robustness of
Recent Defenses to Adversarial
Examples. Entropy 2021, 23, 1359.
https://doi.org/10.3390/e23101359
Academic Editor: Luis
Hernández-Callejo
Received: 16 September 2021
Accepted: 14 October 2021
Published: 18 October 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Department of Computer Science and Engineering, University of Connecticut, Storrs, CT 06269, USA
2
Department of Electrical and Computer Engineering, University of Connecticut, Storrs, CT 06269, USA;
deniz.gurevin@uconn.edu
3
CWI, 1098 XG Amsterdam, The Netherlands; Marten.van.Dijk@cwi.nl
4
eBay, San Jose, CA 95125, USA; phuongha.ntu@gmail.com
* Correspondence: kaleel.mahmood@uconn.edu
Abstract: Many defenses have recently been proposed at venues like NIPS, ICML, ICLR and CVPR.
These defenses are mainly focused on mitigating white-box attacks. They do not properly examine
black-box attacks. In this paper, we expand upon the analyses of these defenses to include adaptive
black-box adversaries. Our evaluation is done on nine defenses including Barrage of Random Trans-
forms, ComDefend, Ensemble Diversity, Feature Distillation, The Odds are Odd, Error Correcting
Codes, Distribution Classifier Defense, K-Winner Take All and Buffer Zones. Our investigation is
done using two black-box adversarial models and six widely studied adversarial attacks for CIFAR-10
and Fashion-MNIST datasets. Our analyses show most recent defenses (7 out of 9) provide only
marginal improvements in security (
<
25%), as compared to undefended networks. For every defense,
we also show the relationship between the amount of data the adversary has at their disposal, and
the effectiveness of adaptive black-box attacks. Overall, our results paint a clear picture: defenses
need both thorough white-box and black-box analyses to be considered secure. We provide this large
scale study and analyses to motivate the field to move towards the development of more robust
black-box defenses.
Keywords: adversarial machine learning; black-box attacks; security
1. Introduction
Convolutional Neural Networks (CNNs) are widely used for image classification
[1,2]
and object detection. Despite their widespread use, CNNs have been shown to be vul-
nerable to adversarial examples [
3
]. Adversarial examples are clean images which have
malicious noise added to them. This noise is small enough so that humans can visually
recognize the images, but CNNs misclassify them.
Adversarial examples can be created through white-box or black-box attacks, depend-
ing on the assumed adversarial model. White-box attacks create adversarial examples by
directly using information about the trained parameters in a classifier (e.g., the weights
of a CNN). Black-box attacks on the other hand, assume an adversarial model where the
trained parameters of the classifier are secret or unknown. In black-box attacks, the ad-
versary generates adversarial examples by exploiting other information such as querying
the classifier [
4
–
6
], or using the original dataset the classifier was trained on [
7
–
10
]. We
can also further categorize black-box attacks based on whether the attack tries to tailor the
adversarial example to specifically overcome the defense (adaptive black-box attacks), or
if the attack is fixed regardless of the defense (non-adaptive black-box attacks). In terms
of attacks, we focus on adaptive black-box adversaries. A natural question is why do we
choose this scope?
(1) White-box robustness does not automatically mean black-box robustness. In secu-
rity communities such as cryptology, black-box attacks are considered strictly weaker than
Entropy 2021, 23, 1359. https://doi.org/10.3390/e23101359 https://www.mdpi.com/journal/entropy