Article
Pseudo-Labeling Optimization Based Ensemble
Semi-Supervised Soft Sensor in the Process Industry
Youwei Li
1,2
, Huaiping Jin
1,2,
* , Shoulong Dong
3
, Biao Yang
1,2
and Xiangguang Chen
3
Citation: Li, Y.; Jin, H.; Dong, S.;
Yang, B.; Chen, X. Pseudo-Labeling
Optimization Based Ensemble
Semi-Supervised Soft Sensor in the
Process Industry. Sensors 2021, 21,
8471. https://doi.org/10.3390/
s21248471
Academic Editor: János Abonyi
Received: 16 November 2021
Accepted: 16 December 2021
Published: 19 December 2021
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2021 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
1
Yunnan Key Laboratory of Computer Technologies Application, Kunming 650500, China;
20192104045@stu.kust.edu.cn (Y.L.); biaoykmust@kust.edu.cn (B.Y.)
2
Department of Automation, Faculty of Information Engineering and Automation,
Kunming University of Science and Technology, Kunming 650500, China
3
Department of Chemical Engineering, School of Chemistry and Chemical Engineering,
Beijing Institute of Technology, Beijing 100081, China; sldong@bit.edu.cn (S.D.); xgc1@bit.edu.cn (X.C.)
* Correspondence: jinhuaiping@kust.edu.cn; Tel.: +86-15877986943
Abstract:
Nowadays, soft sensor techniques have become promising solutions for enabling real-time
estimation of difficult-to-measure quality variables in industrial processes. However, labeled data
are often scarce in many real-world applications, which poses a significant challenge when building
accurate soft sensor models. Therefore, this paper proposes a novel semi-supervised soft sensor
method, referred to as ensemble semi-supervised negative correlation learning extreme learning
machine (EnSSNCLELM), for industrial processes with limited labeled data. First, an improved
supervised regression algorithm called NCLELM is developed, by integrating the philosophy of
negative correlation learning into extreme learning machine (ELM). Then, with NCLELM as the
base learning technique, a multi-learner pseudo-labeling optimization approach is proposed, by
converting the estimation of pseudo labels as an explicit optimization problem, in order to obtain
high-confidence pseudo-labeled data. Furthermore, a set of diverse semi-supervised NCLELM
models (SSNCLELM) are developed from different enlarged labeled sets, which are obtained by
combining the labeled and pseudo-labeled training data. Finally, those SSNCLELM models whose
prediction accuracies were not worse than their supervised counterparts were combined using a
stacking strategy. The proposed method can not only exploit both labeled and unlabeled data, but also
combine the merits of semi-supervised and ensemble learning paradigms, thereby providing superior
predictions over traditional supervised and semi-supervised soft sensor methods. The effectiveness
and superiority of the proposed method were demonstrated through two chemical applications.
Keywords:
soft sensor; unlabeled data; label scarcity; semi-supervised learning; ensemble learning;
pseudo labeling; evolutionary optimization; negative correlation learning; extreme learning machine
1. Introduction
Modern industrial processes are equipped with a large number of measurement
devices, in order to allow the implementation of advanced monitoring, optimization, and
control of the production process. However, many crucial quality variables in industrial
process are difficult to measure online, due to the lack of reliable hardware sensors or the
high investment in the purchase and maintenance of apparatuses. To tackle this problem,
soft sensor technology, as a promising indirect measurement tool, has been proposed,
to enable real-time estimations of difficult-to-measure process variables [
1
,
2
]. The basis
of a soft sensor is to build a mathematical model describing the relationship between
the difficult-to-measure target variable and the easy-to-measure secondary variables, and
then perform online estimation for the query data, based on the built predictive model.
Generally, soft sensors can be divided into two categories: first principle, and data-driven
methods. The former method type requires deep physical and chemical knowledge, which
is often impossible in many real-world applications. Alternatively, data-driven methods,
Sensors 2021, 21, 8471. https://doi.org/10.3390/s21248471 https://www.mdpi.com/journal/sensors