
Citation: Zhou, Y.; Wang, J.; Wang, Z.
Bearing Faulty Prediction Method
Based on Federated Transfer
Learning and Knowledge Distillation.
Machines 2022, 10, 376. https://
doi.org/10.3390/machines10050376
Academic Editors: Wenjun (Chris)
Zhang, Kelvin K.L. Wong, Dhanjoo
N. Ghista and Andrew W.H. Ip
Received: 18 April 2022
Accepted: 12 May 2022
Published: 16 May 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Bearing Faulty Prediction Method Based on Federated Transfer
Learning and Knowledge Distillation
Yiqing Zhou
1,
* , Jian Wang
1
and Zeru Wang
2
1
Computer Integrated Manufacturing System (CIMS) Research Center, College of Electronics and Information
Engineering, Tongji University, Shanghai 201804, China; jwang@tongji.edu.cn
2
Computer Aided Design (CAD) Research Center, College of Electronics and Information Engineering,
Tongji University, Shanghai 201804, China; 2033019@tongji.edu.cn
* Correspondence: 1710334@tongji.edu.cn
Abstract:
In this paper, a novel bearing faulty prediction method based on federated transfer learning
and knowledge distillation is proposed with three stages: (1) a “signal to image” conversion method
based on the continuous wavelet transform is used as the data pre-processing method to satisfy the
input characteristic of the proposed faulty prediction model; (2) a novel multi-source based federated
transfer learning method is introduced to acquire knowledge from multiple different but related
areas, enhancing the generalization ability of the proposed model; and (3) a novel multi-teacher-based
knowledge distillation is introduced as the knowledge transference way to transfer multi-source
knowledge with dynamic importance weighting, releasing the target data requirement and the target
model parameter size, which makes it possible for the edge-computing based deployment. The
effectiveness of the proposed bearing faulty prediction approach is evaluated on two case studies of
two public datasets offered by the Case Western Reserve University and the Paderborn University,
respectively. The evaluation result shows that the proposed approach outperforms other state-of-the-
art faulty prediction approaches in terms of higher accuracy and lower parameter size with limited
labeled target data.
Keywords:
knowledge distillation; federated transfer learning; parameter size; knowledge transference;
edge-computing deployment
1. Introduction
Intelligent faulty diagnosis is significantly important in the modern manufacturing
industry as it can greatly reduce the machine maintenance cost and prevent catastrophic
failure in the early stages of production. The current faulty prediction approaches can
be divided into three categories: model based, knowledge based and data-driven based
approaches [
1
]. With the development of the modern computing ability and storage
capacity, the data-driven based machine faulty prediction approach has been the most used
one. This is entirely based on the acquired historical operating datasets [2].
Deep learning, as a branch of the data-driven approach, has achieved compromising
application prospects in the contemporary industrial system due to its powerful ability
to automatically distinguish the representative and discriminative features from the raw
signal data. Therefore, the deep learning-based faulty prediction method has become a key
research point in both academia and industry. The current deep learning models, including
the DBN (deep belief network), DAE (deep auto-encoder), RNN (recurrent neural network)
and CNN (convolution neural network) have already achieved great success in the machine
faulty prediction area. In order to further promote the machine faulty prediction accuracy in
the application of the modern complex industry, some researchers have designed different
variants and combinations of the deep learning models. Shao et al. [
3
] combined the CNN
with DBN for capturing both the two-dimensional structure and the periodic characteristics
Machines 2022, 10, 376. https://doi.org/10.3390/machines10050376 https://www.mdpi.com/journal/machines