Article
SalfMix: A Novel Single Image-Based Data Augmentation
Technique Using a Saliency Map
Jaehyeop Choi , Chaehyeon Lee , Donggyu Lee and Heechul Jung *
Citation: Choi, J.; Lee, C.; Lee, D.;
Jung, H. SalfMix: A Novel Single
Image-Based Data Augmentation
Technique Using a Saliency Map.
Sensors 2021, 21, 8444. https://
doi.org/10.3390/s21248444
Academic Editor: Nunzio Cennamo
Received: 19 November 2021
Accepted: 14 December 2021
Published: 17 December 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Department of Artificial Intelligence, Kyungpook National University, Daegu 41566, Korea;
jaebb95@knu.ac.kr (J.C.); 123456ccdd@knu.ac.kr (C.L.); dglee@knu.ac.kr (D.L.)
* Correspondence: heechul@knu.ac.kr; Tel.: +82-53-950-4558
Abstract:
Modern data augmentation strategies such as Cutout, Mixup, and CutMix, have achieved
good performance in image recognition tasks. Particularly, the data augmentation approaches,
such as Mixup and CutMix, that mix two images to generate a mixed training image, could generalize
convolutional neural networks better than single image-based data augmentation approaches such
as Cutout. We focus on the fact that the mixed image can improve generalization ability, and we
wondered if it would be effective to apply it to a single image. Consequently, we propose a new
data augmentation method to produce a self-mixed image based on a saliency map, called SalfMix.
Furthermore, we combined SalfMix with state-of-the-art two images-based approaches, such as
Mixup, SaliencyMix, and CutMix, to increase the performance, called HybridMix. The proposed
SalfMix achieved better accuracies than Cutout, and HybridMix achieved state-of-the-art perfor-
mance on three classification datasets: CIFAR-10, CIFAR-100, and TinyImageNet-200. Furthermore,
HybridMix achieved the best accuracy in object detection tasks on the VOC dataset, in terms of mean
average precision.
Keywords:
deep learning; data augmentation; convolutional neural network (CNN); image classification
1. Introduction
Deep learning has achieved remarkable performances in various computer vision
tasks such as image classification [
1
–
4
], segmentation [
5
,
6
], detection [
7
–
11
], and image
quality assessment [
12
]. Generally, deep neural networks (DNNs) require large training
data to achieve high performance. Data augmentation techniques can increase the limited
size of training data and are important elements in the training process of DNNs to
improve their generalization performances. Data augmentation techniques have been
used to train AlexNet [
13
], and geometric data augmentation approaches have been used
to reduce Top-5 error rates of ImageNet classification tasks, such as flip, rotation, crop,
and translation [
13
,
14
]. In 2014, VGG neural networks were proposed, and the scale
jittering data augmentation technique was introduced by [
15
]. The Cutout method, which is
a representative data augmentation approach, performs regional dropout, where pixel
values of a randomly selected region of an input image are removed [
16
]. Regional dropout
approaches have shown better recognition rates than previous geometric transformation
strategies [
16
,
17
]. These data augmentation approaches are performed on a single image,
as shown in Figure 1.
In the recent data augmentation studies, two training images are selected and mixed
during network training, and mixed images are used for training a convolutional neural
network (CNN), such as Mixup [
18
] and CutMix [
19
]. These techniques further improve
generalization performance than traditional single image-based approaches. Most recent
research works such as SaliencyMix [
20
], PuzzleMix [
21
], ResizeMix [
22
], and SnapMix [
23
]
focus on the mixing of two images for data augmentation. Especially, when CutMix mixes
images, random patches are cut and pasted on other images; however, saliency-guided
approaches have recently been proposed and achieve better performances than the original
Sensors 2021, 21, 8444. https://doi.org/10.3390/s21248444 https://www.mdpi.com/journal/sensors