Citation: Ma, T.; Wu, L.; Zhu, S.; Zhu,
H. Multiclassification Prediction of
Clay Sensitivity Using Extreme
Gradient Boosting Based on
Imbalanced Dataset. Appl. Sci. 2022,
12, 1143. https://doi.org/10.3390/
app12031143
Academic Editor: Daniel Dias
Received: 4 December 2021
Accepted: 19 January 2022
Published: 21 January 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Multiclassification Prediction of Clay Sensitivity Using Extreme
Gradient Boosting Based on Imbalanced Dataset
Tao Ma
1
, Lizhou Wu
2,
* , Shuairun Zhu
1
and Hongzhou Zhu
2
1
College of Environment and Civil Engineering, Chengdu University of Technology, Chengdu 610059, China;
opmatao@163.com (T.M.); zhushuairun@163.com (S.Z.)
2
State Key Laboratory of Mountain Bridge and Tunnel Engineering, Chongqing Jiaotong University,
Chongqing 400074, China; zhuhongzhouchina@cqjtu.edu.cn
* Correspondence: lzwu@cqjtu.edu.cn
Abstract:
Predicting clay sensitivity is important to geotechnical engineering design related to
clay. Classification charts and field tests have been used to predict clay sensitivity. However, the
imbalanced distribution of clay sensitivity is often neglected, and the predictive performance could
be more accurate. The purpose of this study was to investigate the performance that extreme gradient
boosting (XGboost) method had in predicting multiclass of clay sensitivity, and the ability that
synthetic minority over-sampling technique (SMOTE) had in addressing imbalanced categories of
clay sensitivity. Six clay parameters were used as the input parameters of XGBoost, and SMOTE
was used to deal with imbalanced classes. Then, the dataset was divided using the cross-validation
(CV) method. Finally, XGBoost, artificial neural network (ANN), and Naive Bayes (NB) were used
to classify clay sensitivity. The F1 score, receiver operating characteristic (ROC), and area under the
ROC curve (AUC) were considered as the performance indicators. The results revealed that XGBoost
showed the best performance in the multiclassification prediction of clay sensitivity. The F1 score and
mean AUC of XGBoost were 0.72 and 0.89, respectively. SMOTE was useful in addressing imbalanced
issues, and XGBoost was an effective and reliable method of classifying clay sensitivity.
Keywords: clay sensitivity; imbalanced categories; SMOTE; XGBoost
1. Introduction
Soft clays are widely distributed near lakes, rivers, and coastal areas in countries
such as Sweden, Norway, Canada, Thailand, and China [
1
–
3
]. For grain size, clay is a
fine-grained mineral (<2
µ
m in size), which is the main component of soil [
4
]. Clay minerals
belong to the family of phyllosilicates and provide information on formation conditions
and diagenesis [
4
]. Additionally, clay can be used as an additive for green processing tech-
nology and sustainable development, such as medical materials and treatment, agriculture,
building materials, adsorbents of organic pollutants in soil, water, and air, etc. [
5
–
10
]. For
engineers, clays are characterized by high compressibility, low shear strength, and high
sensitivity. The sensitivity is defined as the ratio of the unconfined compressive strength of
the undisturbed samples to the strength of the remolded samples [11–13].
Nowadays, in situ and laboratory testing and classification charts are often used to
predict clay sensitivity. Cone Penetration Tests (CPTu) and Field Vane Tests (FVT) are
commonly carried out to obtain the shear strength and classify clay sensitivity [
14
–
18
].
Yafrate et al. [
19
] employed full-flow penetrometers to evaluate the remolded soil strength
and clay sensitivity. Abbaszadeh Shahri et al. [
20
] proposed a Unified Soil Classification
System (USCS) to assess soils classification and used high-resolution files to detect poten-
tial sensitive clays. Different soil classification charts are widely used to determine clay
sensitivity or types [
13
,
21
]. For example, Robertson [
22
] proposed a few updated charts
to predict soil type based on CPTu data. Gylland et al. [
23
] used pore pressure ratio and
modified cone resistance to build a set of diagrams identifying sensitive and quick clays.
Appl. Sci. 2022, 12, 1143. https://doi.org/10.3390/app12031143 https://www.mdpi.com/journal/applsci