Article
Predicting Prolonged Length of ICU Stay through
Machine Learning
Jingyi Wu
1,2
, Yu Lin
3
, Pengfei Li
2
, Yonghua Hu
4,5
, Luxia Zhang
1,2,6
and Guilan Kong
1,2,
*
Citation: Wu, J.; Lin, Y.; Li, P.; Hu, Y.;
Zhang, L.; Kong, G. Predicting
Prolonged Length of ICU Stay through
Machine Learning. Diagnostics 2021,
11, 2242. https://doi.org/10.3390/
diagnostics11122242
Academic Editor: Yorgos Goletsis
Received: 15 November 2021
Accepted: 24 November 2021
Published: 30 November 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
National Institute of Health Data Science, Peking University, Beijing 100191, China; joywu@pku.edu.cn (J.W.);
zhanglx@bjmu.edu.cn (L.Z.)
2
Advanced Institute of Information Technology, Peking University, Hangzhou 311215, China; pfli@aiit.org.cn
3
Department of Medicine and Therapeutics, LKS Institute of Health Science, The Chinese
University of Hong Kong, Hong Kong, China; linyu@link.cuhk.edu.hk
4
Department of Epidemiology and Biostatistics, School of Public Health, Peking University,
Beijing 100191, China; yhhu@bjmu.edu.cn
5
Medical Informatics Center, Peking University, Beijing 100191, China
6
Renal Division, Department of Medicine, Peking University First Hospital, Peking University
Institute of Nephrology, Beijing 100034, China
* Correspondence: guilan.kong@hsc.pku.edu.cn; Tel.: +86-18710098511
Abstract:
This study aimed to construct machine learning (ML) models for predicting prolonged
length of stay (pLOS) in intensive care units (ICU) among general ICU patients. A multicenter
database called eICU (Collaborative Research Database) was used for model derivation and internal
validation, and the Medical Information Mart for Intensive Care (MIMIC) III database was used for
external validation. We used four different ML methods (random forest, support vector machine,
deep learning, and gradient boosting decision tree (GBDT)) to develop prediction models. The
prediction performance of the four models were compared with the customized simplified acute
physiology score (SAPS) II. The area under the receiver operation characteristic curve (AUROC),
area under the precision-recall curve (AUPRC), estimated calibration index (ECI), and Brier score
were used to measure performance. In internal validation, the GBDT model achieved the best overall
performance (Brier score, 0.164), discrimination (AUROC, 0.742; AUPRC, 0.537), and calibration (ECI,
8.224). In external validation, the GBDT model also achieved the best overall performance (Brier
score, 0.166), discrimination (AUROC, 0.747; AUPRC, 0.536), and calibration (ECI, 8.294). External
validation showed that the calibration curve of the GBDT model was an optimal fit, and four ML
models outperformed the customized SAPS II model. The GBDT-based pLOS-ICU prediction model
had the best prediction performance among the five models on both internal and external datasets.
Furthermore, it has the potential to assist ICU physicians to identify patients with pLOS-ICU risk
and provide appropriate clinical interventions to improve patient outcomes.
Keywords:
prolonged length of ICU stay; machine learning; clinical decision rules; medical
informatics
1. Introduction
Intensive care units (ICU) provide complex and resource-intensive treatment for the
sickest hospitalized patients. The need for critical care medicine has grown substantially
over the past decade [
1
] and has consumed a huge portion of the income in many countries
worldwide [
2
]. In the US, critical care medicine costs account for approximately 13% of
hospital costs and 4% of national health expenditures [
3
]. Despite the huge investment
in critical care medicine, medical resources in ICU are usually insufficient to meet the
demands of ICU patients, especially in developing countries. Hospitals are under pressure
to improve the efficiency and reduce costs for critical care. Length of stay in ICU (LOS-
ICU) is a key indicator for medical efficiency [
4
] and critical care quality in hospitals [
5
];
a prolonged LOS-ICU (pLOS-ICU) generally leads to additional use of resources and
Diagnostics 2021, 11, 2242. https://doi.org/10.3390/diagnostics11122242 https://www.mdpi.com/journal/diagnostics