Article
Telecom Churn Prediction System Based on Ensemble Learning
Using Feature Grouping
Tianpei Xu, Ying Ma and Kangchul Kim *
Citation: Xu, T.; Ma, Y.; Kim, K.
Telecom Churn Prediction System
Based on Ensemble Learning Using
Feature Grouping. Appl. Sci. 2021, 11,
4742. https://doi.org/10.3390/
app11114742
Academic Editor: João Carlos de
Oliveira Matias
Received: 20 April 2021
Accepted: 19 May 2021
Published: 21 May 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Department of Computer Engineering, Chonnam National Unversity, Yeosu 59626, Korea;
197525@jnu.ac.kr (T.X.); 207939@jnu.ac.kr (Y.M.)
* Correspondence: kkc@jnu.ac.kr
Abstract:
In recent years, the telecom market has been very competitive. The cost of retaining existing
telecom customers is lower than attracting new customers. It is necessary for a telecom company to
understand customer churn through customer relationship management (CRM). Therefore, CRM
analyzers are required to predict which customers will churn. This study proposes a customer-churn
prediction system that uses an ensemble-learning technique consisting of stacking models and soft
voting. Xgboost, Logistic regression, Decision tree, and Naïve Bayes machine-learning algorithms are
selected to build a stacking model with two levels, and the three outputs of the second level are used
for soft voting. Feature construction of the churn dataset includes equidistant grouping of customer
behavior features to expand the space of features and discover latent information from the churn
dataset. The original and new churn datasets are analyzed in the stacking ensemble model with four
evaluation metrics. The experimental results show that the proposed customer churn predictions
have accuracies of 96.12% and 98.09% for the original and new churn datasets, respectively. These
results are better than state-of-the-art churn recognition systems.
Keywords: customer churn; CRM; machine learning; ensemble learning; feature grouping
1. Introduction
Owing to fierce competition among telecom companies, customer churn is inevitable.
Customer churn is the act of a customer ending a subscription to a service provider and
choosing the services of another company.
Companies must reduce customer churn because it weakens the company. A survey
showed that the annual churn rate in the telecom industry ranges from 20% to 40%, and
the cost of retaining existing customers is 5–10 times lower than the cost of obtaining
new customers [
1
]. The cost of predicting churn customers is 16 times lower than that
for obtaining new customers [
2
]. Decreasing the churn rate by 5% increases the profit
from 25% to 85% [
3
]. This shows that customer-churn prediction is important for the
telecom sector. Telecom companies consider customer relationship management (CRM) an
important factor in retaining existing customers and preventing customer churn.
To retain existing customers, CRM analyzers must predict which customers will churn
and analyze the reasons for customer churn. Once the at-risk customers are identified,
the company must perform marketing campaigns for churn customers to maximize the
churn-customer retention. Therefore, customer-churn prediction is an important part of
CRM [4].
The accuracy of the prediction systems used by CRM analyzers is important. If analyz-
ers are inaccurate in predicting customer churn, no campaigns can be performed. Owing
to recent advancements in data science, data mining and machine learning technologies
provide solutions to customer churn. However, there are several limitations in existing
models. For example, logistic regression, a common churn-prediction model based on
older data-mining methods, is relatively inaccurate. Furthermore, feature construction [
5
]
Appl. Sci. 2021, 11, 4742. https://doi.org/10.3390/app11114742 https://www.mdpi.com/journal/applsci