利用句子级语义协议改进非自回归机器翻译

ID：38952

阅读量：1

大小：0.46 MB

页数：12页

时间：2023-03-14

金币：2

上传者：战必胜



 

Citation: Shuheng, W.; Heyan, H.;

Shumin, S. Improving

Non-Autoregressive Machine

Translation Using Sentence-Level

Semantic Agreement. Appl. Sci. 2022,

12, 5003. https://doi.org/10.3390/

app12105003

Academic Editors: Phivos Mylonas,

Katia Lida Kermanidis and Manolis

Maragoudakis

Received: 19 April 2022

Accepted: 13 May 2022

Published: 16 May 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

Improving Non-Autoregressive Machine Translation Using

Sentence-Level Semantic Agreement

Shuheng Wang

, Heyan Huang

and Shumin Shi

School of Computer Science and Engineering, Nanjing University of Science and Technology,

Nanjing 210094, China; wsh@njust.edu.cn

School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100811, China;

hhy63@bit.edu.cn

* Correspondence: bjssm@bit.edu.cn

Abstract:

Theinference stage can be accelerated signiﬁcantly using a Non-Autoregressive Transformer

(NAT). However, the training objective used in the NAT model also aims to minimize the loss

between the generated words and the golden words in the reference. Since the dependencies between

the target words are lacking, this training objective computed at word level can easily cause semantic

inconsistency between the generated and source sentences. To alleviate this issue, we propose a new

method, Sentence-Level Semantic Agreement (SLSA), to obtain consistency between the source and

generated sentences. Speciﬁcally, we utilize contrastive learning to pull the sentence representations

of the source and generated sentences closer together. In addition, to strengthen the capability of the

encoder, we also integrate an agreement module into the encoder to obtain a better representation

of the source sentence. The experiments are conducted on three translation datasets: the WMT

2014 EN

→

DE task, the WMT 2016 EN

→

RO task, and the IWSLT 2014 DE

→

DE task, and

the improvement in the NAT model’s performance shows the effect of our proposed method.

Keywords: machine translation; non-autoregressive; contrastive learning; semantic agreement

1. Introduction

Over the years, tremendous success has been achieved in encoder–decoder based neu-

ral machine translation (NMT) [

–

]. The encoder maps the source sentence into a hidden

representation, and the target sentence is generated by the decoder from the hidden repre-

sentation in an autoregressive method. This autoregressive method has assisted the NMT

model in obtaining high accuracy [

]. However, because it needs the previously predicted

words as inputs, this also limits the speed of the inference stage. Recently,

Gu et al. [4]

proposed a non-autoregressive transformer (NAT) to break the limitation and reduce the in-

ference latency. In general, the NAT model also utilizes the encoder–decoder framework.

However, by removing the autoregressive method in the decoder, the NAT model can

signiﬁcantly expedite the decoding stage. Yet, the performance of the NAT model still lags

behind the NMT model.

During training, the NAT model, as the NMT model, uses a word-level cross-entropy

to optimize the whole model. Nevertheless, under the background of non-autoregressive

translation, the dependencies in the target words cannot be learned properly with the word-

level cross-entropy [

]. Although it encourages the NAT model to generate the correct

token at each position, due to the lack of target dependency, the NAT model cannot

consider global correctness. The NAT model cannot efﬁciently model the target dependency

well, and the cross-entropy loss further weakens this feature, causing undertranslation

or overtranslation [

]. Recently, some research has proposed ways to alleviate this issue. For

example, Sun et al. [

] utilized a CRF module to model the global path in the decoder, and

Shao et al. [5] used a bag-of-words loss to encourage the NAT model to capture the target

dependency. However, this previous research only considered global or partial modeling

Appl. Sci. 2022, 12, 5003. https://doi.org/10.3390/app12105003 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 12



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

利用句子级语义协议改进非自回归机器翻译

最近更新

大家都在看

相关文章

相关标签