基于优化增益控制策略的普通话自动语音识别性能改进

ID:38707

大小:2.53 MB

页数:19页

时间:2023-03-14

金币:2

上传者:战必胜

 
Citation: Wang, D.; Wei, Y.; Zhang,
K.; Ji, D.; Wang, Y. Automatic Speech
Recognition Performance
Improvement for Mandarin Based on
Optimizing Gain Control Strategy.
Sensors 2022, 22, 3027. https://
doi.org/10.3390/s22083027
Academic Editors: Enrico Vezzetti,
Gabriele Baronio, Domenico
Speranza, Luca Ulrich and Andrea
Luigi Guerra
Received: 24 March 2022
Accepted: 12 April 2022
Published: 15 April 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
sensors
Article
Automatic Speech Recognition Performance Improvement for
Mandarin Based on Optimizing Gain Control Strategy
Desheng Wang , Yangjie Wei * , Ke Zhang, Dong Ji and Yi Wang
Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science
and Engineering, Northeastern University, Shenyang 110169, China; deshengwang001@gmail.com (D.W.);
1910621@stu.neu.edu.cn (K.Z.); jidong@cse.neu.edu.cn (D.J.); wangyi@cse.neu.edu.cn (Y.W.)
* Correspondence: weiyangjie@cse.neu.edu.cn
Abstract:
Automatic speech recognition (ASR) is an essential technique of human–computer inter-
actions; gain control is a commonly used operation in ASR. However, inappropriate gain control
strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of
sufficient theoretical analyses and proof of the relationship between gain control and WER, various
unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal
gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named
maximized original signal transmission (MOST) is proposed in this study to minimize the adverse
impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative
relationship between the gain control strategy and the ASR performance was established using the
noise figure index. Second, through an analysis of the quantitative relationship, an optimal MOST
gain control strategy with minimal performance degradation was theoretically deduced. Finally,
comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain
control strategy can significantly reduce the WER of the experimental ASR system, with a 10% mean
absolute WER reduction at 9 dB gain.
Keywords:
human–computer interaction; automatic speech recognition (ASR); word error rate
(WER); gain control; noise figure; maximized original signal transmission (MOST)
1. Introduction
Automatic speech recognition (ASR) has been widely integrated into human–robot
interactions in the form of voice user interfaces (VUIs) [
1
3
]. Virtual assistants [
4
], vehicle
systems [
5
], and home automation all make daily life more convenient [
6
9
], and the
application scope of ASR is growing in popularity as more people have recognized VUIs as
more natural than graphical user interfaces (GUIs) [10,11].
Currently, the performance of the ASR system in many human–robot interaction
scenarios is unsatisfactory due to robustness limitations, and one of the critical factors is
that various practical noises make it more challenging to extract the features, such as Mel-
frequency cepstral coefficients (MFCC) [
12
14
], log-channel energies [
15
], and pitch-based
features [
12
,
16
]. Some common noises have been widely researched by experts in ASR,
such as background noise [
9
,
17
], reverberation [
18
21
], squeal noise, and noises tightly
related to hardware, such as thermal noises from amplifiers [
22
], quantizing noises from
analog to digital converters (ADCs) [
23
], and signal quality loss caused by coding [
24
],
compression, and transmission [
25
]. However, noises related to gain controls have received
less attention. Gain control represents the amplitude adjustment of signals, and it is one of
the frequently used operations in ASR systems. A large gain may cause the ASR system
not to work properly, such as data overflow from the software perspective, and clipping
from the hardware perspective. Therefore, gain control in this paper refers to original gain
controls under the premise of no clipping occurring.
Sensors 2022, 22, 3027. https://doi.org/10.3390/s22083027 https://www.mdpi.com/journal/sensors
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭