基于标签的鉴别特征学习的有效迁移学习

ID:39385

大小:0.58 MB

页数:10页

时间:2023-03-14

金币:2

上传者:战必胜

 
Citation: Kim, G.; Kang, S. Effective
Transfer Learning with Label-Based
Discriminative Feature Learning.
Sensors 2022, 22, 2025. https://
doi.org/10.3390/s22052025
Academic Editor: Andrea Cataldo
Received: 8 February 2022
Accepted: 3 March 2022
Published: 4 March 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
Effective Transfer Learning with Label-Based Discriminative
Feature Learning
Gyunyeop Kim and Sangwoo Kang *
School of Computing, Gachon University, Seongnam 13120, Korea; gyop0817@gachon.ac.kr
* Correspondence: swkang@gachon.ac.kr
Abstract:
The performance of natural language processing with a transfer learning methodology has
improved by applying pre-training language models to downstream tasks with a large number of
general data. However, because the data used in pre-training are irrelevant to the downstream tasks, a
problem occurs in that it learns general features rather than those features specific to the downstream
tasks. In this paper, a novel learning method is proposed for embedding pre-trained models to
learn specific features of such tasks. The proposed method learns the label features of downstream
tasks through contrast learning using label embedding and sampled data pairs. To demonstrate the
performance of the proposed method, we conducted experiments on sentence classification datasets
and evaluated whether the features of the downstream tasks have been learned through a PCA and a
clustering of the embeddings.
Keywords: natural language processing; transfer learning; pre-training; word embedding
1. Introduction
Artificial intelligence has shown good performance through deep learning from large
numbers of data. According to [
1
], transfer learning conducted through pre-learning with
large numbers of data can improve the performance of downstream tasks. Transfer learning
refers to pre-learning with unsupervised data that is easy to collect. We proceed with the
learning of downstream tasks using a pre-learning model. These processes demonstrate
the advantage of an easy collection of unsupervised datasets, which can improve the
performance of a downstream task. Therefore, many current artificial intelligence methods
use transfer-learning models to achieve a high performance.
In natural language processing (NLP), transfer learning has shown significant perfor-
mance improvements when applied to language models. In NLP, transfer-learning-based
language models such as BERT [
2
] and ELECTRA [
3
] are pre-learned using large numbers
of natural language data that have been crawled, such as Wiki datasets. Because the data
built through crawling make up an unsupervised dataset, learning progresses through
semi-supervised learning, such as a masked token preparation. This pre-learned language
model is used as a model for generating word embeddings during fine tuning. During the
fine-tuning process, downstream task learning is conducted through the construction of a
model, including a pre-learning model.
However, the pre-learning model applied in transfer learning uses a dataset that is
independent of the downstream task. Thus, during the pre-training process, the model
learns general features rather than features specific to downstream tasks. Word embeddings
derived through the pre-trained model may have a higher percentage of common features
than the information required for downstream tasks. As a result, word embeddings derived
from pre-trained models can have unnecessary features in downstream tasks. Furthermore,
fine-tuning using the word embeddings through a pre-trained model can be compromised
by the unnecessary features presented in the word embeddings.
In this study, further learning is applied to induce pre-trained models to derive
word embeddings optimized for downstream tasks. Using the proposed method, word
Sensors 2022, 22, 2025. https://doi.org/10.3390/s22052025 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭