Article
A Machine Learning Approach to Predicting Diabetes
Complications
Yazan Jian *, Michel Pasquier, Assim Sagahyroon and Fadi Aloul
Citation: Jian, Y.; Pasquier, M.;
Sagahyroon, A.; Aloul, F. A Machine
Learning Approach to Predicting
Diabetes Complications. Healthcare
2021, 9, 1712. https://doi.org/
10.3390/healthcare9121712
Academic Editors: Keun Ho Ryu and
Nipon Theera-Umpon
Received: 27 October 2021
Accepted: 4 December 2021
Published: 9 December 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Department of Computer Science and Engineering, American University of Sharjah,
Sharjah 26666, United Arab Emirates; mpasquier@aus.edu (M.P.); asagahyroon@aus.edu (A.S.);
faloul@aus.edu (F.A.)
* Correspondence: b00087296@aus.edu; Tel.: +971-5-52164338
Abstract:
Diabetes mellitus (DM) is a chronic disease that is considered to be life-threatening. It
can affect any part of the body over time, resulting in serious complications such as nephropathy,
neuropathy, and retinopathy. In this work, several supervised classification algorithms were applied
for building different models to predict and classify eight diabetes complications. The complications
include metabolic syndrome, dyslipidemia, neuropathy, nephropathy, diabetic foot, hypertension,
obesity, and retinopathy. For this study, a dataset collected by the Rashid Center for Diabetes and
Research (RCDR) located in Ajman, UAE, was utilized. The dataset consists of 884 records with
79 features. Some essential preprocessing steps were applied to handle the missing values and
unbalanced data problems. Furthermore, feature selection was performed to select the top five and
ten features for each complication. The final number of records used to train and build the binary
classifiers for each complication was as follows: 428—metabolic syndrome, 836—dyslipidemia,
223—neuropathy, 233—nephropathy, 240—diabetic foot, 586—hypertension, 498—obesity, 228—
retinopathy. Repeated stratified k-fold cross-validation (with k = 10 and a total of 10 repetitions) was
employed for a better estimation of the performance. Accuracy and F1-score were used to evaluate
the models’ performance reaching a maximum of 97.8% and 97.7% for accuracy and F1-scores,
respectively. Moreover, by comparing the performance achieved using different attributes’ sets, it
was found that by using a selected number of features, we can still build adequate classifiers.
Keywords: diabetes prediction; diabetes complications; supervised learning
1. Introduction
Diabetes mellitus, or diabetes for short, is a chronic disease that occurs either when
the pancreas does not produce enough insulin or when the body cannot effectively use the
insulin it produces [
1
]. Diabetes has two main types called type 1 and type 2. In type 1
diabetes (also known as insulin-dependent or childhood-onset), there is insulin production
deficiency in the body, which requires daily administration of insulin, whereas in type 2
diabetes (known formally as non-insulin-dependent or adult-onset), the body cannot use
insulin effectively. According to the World Health Organization (WHO), the number of
people with diabetes in 2014 was 422 million. Moreover, in 2016, diabetes was the direct
cause of 1.6 million deaths [1].
There are different causes for diabetes. For instance, type 1 diabetes mellitus (T1DM)
can develop due to an autoimmune reaction that destroys the cells in the pancreas that make
insulin, called beta cells [
2
], whereas type 2 diabetes is mainly caused by age, family history
of diabetes, high blood pressure, high levels of triglycerides, heart disease or stroke [
3
].
Early detection of diabetes can be of great benefit, especially because the progression of
prediabetes to type 2 diabetes is quite high. According to CDC [
4
], diabetes can affect any
part of the body over time, leading to different types of complications. The most common
types are divided into micro- and macrovascular disorders. The former are those long-term
complications that affect small blood vessels, including retinopathy, nephropathy, and
Healthcare 2021, 9, 1712. https://doi.org/10.3390/healthcare9121712 https://www.mdpi.com/journal/healthcare