Citation: Sobreiro, P.; Alonso, J.G.;
Domingos Martinho; Berrocal, J.
Hybrid Random Forest Survival
Model to Predict Customer
Membership Dropout. Electronics
2022, 11, 3328. https://doi.org/
10.3390/electronics11203328
Academic Editors: Sławomir
Nowaczyk, Rita P. Ribeiro and
Grzegorz Nalepa
Received: 18 August 2022
Accepted: 9 October 2022
Published: 15 October 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Hybrid Random Forest Survival Model to Predict Customer
Membership Dropout
Pedro Sobreiro
1,2,
*
,†
, José Garcia-Alonso
2
, Domingos Martinho
3
and Javier Berrocal
2,†
1
Sport Sciences School of Rio Maior (ESDRM), Polytechnic Institute of Santarém, 2001-904 Santarém, Portugal
2
Quercus Software Engineering Group, University of Extremadura, 06006 Badajoz, Spain
3
ISLA Santarém, 2000-241 Santarém, Portugal
* Correspondence: sobreiro@esdrm.ipsantarem.pt; Tel.: +351-935585561
† These authors contributed equally to this work.
Abstract:
Dropout prediction is a problem that must be addressed in various organizations, as
retaining customers is generally more profitable than attracting them. Existing approaches address the
problem considering a dependent variable representing dropout or non-dropout, without considering
the dynamic perspetive that the dropout risk changes over time. To solve this problem, we explore the
use of random survival forests combined with clusters, in order to evaluate whether the prediction
performance improves. The model performance was determined using the concordance probability,
Brier Score and the error in the prediction considering 5200 customers of a Health Club. Our results
show that the prediction performance in the survival models increased substantially in the models
using clusters rather than that without clusters, with a statistically significant difference between the
models. The model using a hybrid approach improved the accuracy of the survival model, providing
support to develop countermeasures considering the period in which dropout is likely to occur.
Keywords: customer dropout; machine learning; survival analysis
1. Introduction
Customer retention is a problem that many organizations have to deal with, in the
context of which dropout prediction provides insights to identify customers that could
churn. Dropout represents the decision of a customer to end their relationship with
an organization [
1
], which creates two outcomes: Dropout or non-dropout. The case
where dropout is developed has two main scenarios [
2
,
3
]: (1) Contractual settings, where
customers pay a monthly fee and the customer informs the end of the relationship; and
(2) non-contractual settings, where the organization has to extrapolate whether the customer
is still active or not. In the contractual setting, the customer must choose whether they
will dropout or not; for example, if they renew a contract or not [
4
]. This means that, in
contractual settings, the customer dropout represents an explicit ending of a relationship
that is more penalizing than that in non-contractual settings [
5
], which has implications for
the profitability of organizations, increasing marketing costs and reducing sales [6].
The advantages of developing retention strategies have been supported in the con-
cept that the costs of customer retention are lower than those associated with customer
acquisition [
7
,
8
], where a reduction of dropout by 5% could realize almost a duplication of
profits [
9
]. To address this problem, the use of the customer databases could be explored,
which is considered the most valuable asset that most organizations possess [
10
]. The
development of a customer retention strategy could be supported through the identifica-
tion of customers may dropout [
11
]; for example, using churn prediction models to detect
customers with high propensity to dropout [12].
The anticipation of the dropout allows for the development of countermeasures to
reduce customer churn. Several studies have addressed the problem related to customer
retention in trying to improve the profitability [
13
–
15
]; in particular, organizations have
Electronics 2022, 11, 3328. https://doi.org/10.3390/electronics11203328 https://www.mdpi.com/journal/electronics