Article
A KPI-Based Probabilistic Soft Sensor Development
Approach that Maximizes the Coefficient
of Determination
Yue Zhang
1
, Xu Yang
1,
* , Yuri A. W. Shardt
2
, Jiarui Cui
1
and Chaonan Tong
1
1
Key Laboratory of Knowledge Automation for Industrial Processes of Ministry of Education, School of
Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China;
s20160638@xs.ustb.edu.cn (Y.Z.); cuijiarui@ustb.edu.cn (J.C.); tcn@ies.ustb.edu.cn (C.T.)
2
Department of Automation Engineering, Technical University of Ilmenau,
98684 Ilmenau, Thuringia, Germany; yuri.shardt@tu-ilmenau.de
* Correspondence: yangxu@ustb.edu.cn
Received: 28 July 2018; Accepted: 7 September 2018; Published: 12 September 2018
Abstract:
Advanced technology for process monitoring and fault diagnosis is widely used in complex
industrial processes. An important issue that needs to be considered is the ability to monitor key
performance indicators (KPIs), which often cannot be measured sufficiently quickly or accurately.
This paper proposes a data-driven approach based on maximizing the coefficient of determination
for probabilistic soft sensor development when data are missing. Firstly, the problem of missing
data in the training sample set is solved using the expectation maximization (EM) algorithm. Then,
by maximizing the coefficient of determination, a probability model between secondary variables
and the KPIs is developed. Finally, a Gaussian mixture model (GMM) is used to estimate the joint
probability distribution in the probabilistic soft sensor model, whose parameters are estimated using
the EM algorithm. An experimental case study on the alumina concentration in the aluminum
electrolysis industry is investigated to demonstrate the advantages and the performance of the
proposed approach.
Keywords:
soft sensor; coefficient of determination maximization strategy; expectation maximization
(EM) algorithm; Gaussian mixture model (GMM); alumina concentration
1. Introduction
With the increasing demands placed on industry, requiring a decrease in the defective rate of
products, better economic efficiency, and improved safety, there has been a growing demand to develop
and implement approaches that can improve the overall control strategy [
1
]. The first issue that needs
to be solved is achieving accurate and real-time estimation of key performance indicators (KPIs) [
2
].
The difficulty
is that these KPIs are usually not easy to measure, or the measurement has significant
time delay. Even if some KPIs are measurable, due to the complexity and nonlinearity of modern
industrial systems and their complex working conditions, the KPIs may be extremely unreliable [
3
].
One way to solve the above problems is to develop a soft sensor, which seeks to select a group of
easier-to-measure secondary variables that are correlated with the required primary variables (i.e., KPIs
in this paper), so that the system is capable of providing process information as often as necessary
for control [
4
,
5
]. In the development of a successful soft sensor, a good process model is required.
The process models can be divided into two major categories: first principles models and data-driven
models [
6
,
7
]. Although it is desirable to apply mass and energy balances to build a complete first
principles model, lack of process knowledge, plant–model mismatch, and nonlinear characteristics
limit the applicability of such an approach to the simplest processes. As an alternative, data-driven
Sensors 2018, 18, 3058; doi:10.3390/s18093058 www.mdpi.com/journal/sensors