Citation: Kim, G.; Kang, S. Effective
Transfer Learning with Label-Based
Discriminative Feature Learning.
Sensors 2022, 22, 2025. https://
doi.org/10.3390/s22052025
Academic Editor: Andrea Cataldo
Received: 8 February 2022
Accepted: 3 March 2022
Published: 4 March 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
Effective Transfer Learning with Label-Based Discriminative
Feature Learning
Gyunyeop Kim and Sangwoo Kang *
School of Computing, Gachon University, Seongnam 13120, Korea; gyop0817@gachon.ac.kr
* Correspondence: swkang@gachon.ac.kr
Abstract:
The performance of natural language processing with a transfer learning methodology has
improved by applying pre-training language models to downstream tasks with a large number of
general data. However, because the data used in pre-training are irrelevant to the downstream tasks, a
problem occurs in that it learns general features rather than those features specific to the downstream
tasks. In this paper, a novel learning method is proposed for embedding pre-trained models to
learn specific features of such tasks. The proposed method learns the label features of downstream
tasks through contrast learning using label embedding and sampled data pairs. To demonstrate the
performance of the proposed method, we conducted experiments on sentence classification datasets
and evaluated whether the features of the downstream tasks have been learned through a PCA and a
clustering of the embeddings.
Keywords: natural language processing; transfer learning; pre-training; word embedding
1. Introduction
Artificial intelligence has shown good performance through deep learning from large
numbers of data. According to [
1
], transfer learning conducted through pre-learning with
large numbers of data can improve the performance of downstream tasks. Transfer learning
refers to pre-learning with unsupervised data that is easy to collect. We proceed with the
learning of downstream tasks using a pre-learning model. These processes demonstrate
the advantage of an easy collection of unsupervised datasets, which can improve the
performance of a downstream task. Therefore, many current artificial intelligence methods
use transfer-learning models to achieve a high performance.
In natural language processing (NLP), transfer learning has shown significant perfor-
mance improvements when applied to language models. In NLP, transfer-learning-based
language models such as BERT [
2
] and ELECTRA [
3
] are pre-learned using large numbers
of natural language data that have been crawled, such as Wiki datasets. Because the data
built through crawling make up an unsupervised dataset, learning progresses through
semi-supervised learning, such as a masked token preparation. This pre-learned language
model is used as a model for generating word embeddings during fine tuning. During the
fine-tuning process, downstream task learning is conducted through the construction of a
model, including a pre-learning model.
However, the pre-learning model applied in transfer learning uses a dataset that is
independent of the downstream task. Thus, during the pre-training process, the model
learns general features rather than features specific to downstream tasks. Word embeddings
derived through the pre-trained model may have a higher percentage of common features
than the information required for downstream tasks. As a result, word embeddings derived
from pre-trained models can have unnecessary features in downstream tasks. Furthermore,
fine-tuning using the word embeddings through a pre-trained model can be compromised
by the unnecessary features presented in the word embeddings.
In this study, further learning is applied to induce pre-trained models to derive
word embeddings optimized for downstream tasks. Using the proposed method, word
Sensors 2022, 22, 2025. https://doi.org/10.3390/s22052025 https://www.mdpi.com/journal/sensors