利用句子级语义协议改进非自回归机器翻译

ID:38952

大小:0.46 MB

页数:12页

时间:2023-03-14

金币:2

上传者:战必胜

 
Citation: Shuheng, W.; Heyan, H.;
Shumin, S. Improving
Non-Autoregressive Machine
Translation Using Sentence-Level
Semantic Agreement. Appl. Sci. 2022,
12, 5003. https://doi.org/10.3390/
app12105003
Academic Editors: Phivos Mylonas,
Katia Lida Kermanidis and Manolis
Maragoudakis
Received: 19 April 2022
Accepted: 13 May 2022
Published: 16 May 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
Improving Non-Autoregressive Machine Translation Using
Sentence-Level Semantic Agreement
Shuheng Wang
1
, Heyan Huang
2
and Shumin Shi
2,
*
1
School of Computer Science and Engineering, Nanjing University of Science and Technology,
Nanjing 210094, China; wsh@njust.edu.cn
2
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100811, China;
hhy63@bit.edu.cn
* Correspondence: bjssm@bit.edu.cn
Abstract:
Theinference stage can be accelerated significantly using a Non-Autoregressive Transformer
(NAT). However, the training objective used in the NAT model also aims to minimize the loss
between the generated words and the golden words in the reference. Since the dependencies between
the target words are lacking, this training objective computed at word level can easily cause semantic
inconsistency between the generated and source sentences. To alleviate this issue, we propose a new
method, Sentence-Level Semantic Agreement (SLSA), to obtain consistency between the source and
generated sentences. Specifically, we utilize contrastive learning to pull the sentence representations
of the source and generated sentences closer together. In addition, to strengthen the capability of the
encoder, we also integrate an agreement module into the encoder to obtain a better representation
of the source sentence. The experiments are conducted on three translation datasets: the WMT
2014 EN
DE task, the WMT 2016 EN
RO task, and the IWSLT 2014 DE
DE task, and
the improvement in the NAT model’s performance shows the effect of our proposed method.
Keywords: machine translation; non-autoregressive; contrastive learning; semantic agreement
1. Introduction
Over the years, tremendous success has been achieved in encoder–decoder based neu-
ral machine translation (NMT) [
1
3
]. The encoder maps the source sentence into a hidden
representation, and the target sentence is generated by the decoder from the hidden repre-
sentation in an autoregressive method. This autoregressive method has assisted the NMT
model in obtaining high accuracy [
3
]. However, because it needs the previously predicted
words as inputs, this also limits the speed of the inference stage. Recently,
Gu et al. [4]
proposed a non-autoregressive transformer (NAT) to break the limitation and reduce the in-
ference latency. In general, the NAT model also utilizes the encoder–decoder framework.
However, by removing the autoregressive method in the decoder, the NAT model can
significantly expedite the decoding stage. Yet, the performance of the NAT model still lags
behind the NMT model.
During training, the NAT model, as the NMT model, uses a word-level cross-entropy
to optimize the whole model. Nevertheless, under the background of non-autoregressive
translation, the dependencies in the target words cannot be learned properly with the word-
level cross-entropy [
5
]. Although it encourages the NAT model to generate the correct
token at each position, due to the lack of target dependency, the NAT model cannot
consider global correctness. The NAT model cannot efficiently model the target dependency
well, and the cross-entropy loss further weakens this feature, causing undertranslation
or overtranslation [
5
]. Recently, some research has proposed ways to alleviate this issue. For
example, Sun et al. [
6
] utilized a CRF module to model the global path in the decoder, and
Shao et al. [5] used a bag-of-words loss to encourage the NAT model to capture the target
dependency. However, this previous research only considered global or partial modeling
Appl. Sci. 2022, 12, 5003. https://doi.org/10.3390/app12105003 https://www.mdpi.com/journal/applsci
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭