基于优化增益控制策略的普通话自动语音识别性能改进

ID：38707

阅读量：0

大小：2.53 MB

页数：19页

时间：2023-03-14

金币：2

上传者：战必胜



 

Citation: Wang, D.; Wei, Y.; Zhang,

K.; Ji, D.; Wang, Y. Automatic Speech

Recognition Performance

Improvement for Mandarin Based on

Optimizing Gain Control Strategy.

Sensors 2022, 22, 3027. https://

doi.org/10.3390/s22083027

Academic Editors: Enrico Vezzetti,

Gabriele Baronio, Domenico

Speranza, Luca Ulrich and Andrea

Luigi Guerra

Received: 24 March 2022

Accepted: 12 April 2022

Published: 15 April 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

sensors

Article

Automatic Speech Recognition Performance Improvement for

Mandarin Based on Optimizing Gain Control Strategy

Desheng Wang , Yangjie Wei * , Ke Zhang, Dong Ji and Yi Wang

Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer Science

and Engineering, Northeastern University, Shenyang 110169, China; deshengwang001@gmail.com (D.W.);

1910621@stu.neu.edu.cn (K.Z.); jidong@cse.neu.edu.cn (D.J.); wangyi@cse.neu.edu.cn (Y.W.)

* Correspondence: weiyangjie@cse.neu.edu.cn

Abstract:

Automatic speech recognition (ASR) is an essential technique of human–computer inter-

actions; gain control is a commonly used operation in ASR. However, inappropriate gain control

strategies can lead to an increase in the word error rate (WER) of ASR. As there is a current lack of

sufﬁcient theoretical analyses and proof of the relationship between gain control and WER, various

unconstrained gain control strategies have been adopted on realistic ASR systems, and the optimal

gain control with respect to the lowest WER, is rarely achieved. A gain control strategy named

maximized original signal transmission (MOST) is proposed in this study to minimize the adverse

impact of gain control on ASR systems. First, by modeling the gain control strategy, the quantitative

relationship between the gain control strategy and the ASR performance was established using the

noise ﬁgure index. Second, through an analysis of the quantitative relationship, an optimal MOST

gain control strategy with minimal performance degradation was theoretically deduced. Finally,

comprehensive comparative experiments on a Mandarin dataset show that the proposed MOST gain

control strategy can signiﬁcantly reduce the WER of the experimental ASR system, with a 10% mean

absolute WER reduction at −9 dB gain.

Keywords:

human–computer interaction; automatic speech recognition (ASR); word error rate

(WER); gain control; noise ﬁgure; maximized original signal transmission (MOST)

1. Introduction

Automatic speech recognition (ASR) has been widely integrated into human–robot

interactions in the form of voice user interfaces (VUIs) [

–

]. Virtual assistants [

], vehicle

systems [

], and home automation all make daily life more convenient [

–

], and the

application scope of ASR is growing in popularity as more people have recognized VUIs as

more natural than graphical user interfaces (GUIs) [10,11].

Currently, the performance of the ASR system in many human–robot interaction

scenarios is unsatisfactory due to robustness limitations, and one of the critical factors is

that various practical noises make it more challenging to extract the features, such as Mel-

frequency cepstral coefﬁcients (MFCC) [

–

], log-channel energies [

], and pitch-based

features [

]. Some common noises have been widely researched by experts in ASR,

such as background noise [

], reverberation [

–

], squeal noise, and noises tightly

related to hardware, such as thermal noises from ampliﬁers [

], quantizing noises from

analog to digital converters (ADCs) [

], and signal quality loss caused by coding [

compression, and transmission [

]. However, noises related to gain controls have received

less attention. Gain control represents the amplitude adjustment of signals, and it is one of

the frequently used operations in ASR systems. A large gain may cause the ASR system

not to work properly, such as data overﬂow from the software perspective, and clipping

from the hardware perspective. Therefore, gain control in this paper refers to original gain

controls under the premise of no clipping occurring.

Sensors 2022, 22, 3027. https://doi.org/10.3390/s22083027 https://www.mdpi.com/journal/sensors

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 19



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

基于优化增益控制策略的普通话自动语音识别性能改进

最近更新

大家都在看

相关文章

相关标签