一种通过模拟人工耳蜗处理系统进行病理语音识别的新技术

ID：39022

阅读量：1

大小：2.86 MB

页数：21页

时间：2023-03-14

金币：2

上传者：战必胜



 

Citation: Islam, R.; Abdel-Raheem, E.;

Tarique, M. A Novel Pathological

Voice Identiﬁcation Technique

through Simulated Cochlear Implant

Processing Systems. Appl. Sci. 2022,

12, 2398. https://doi.org/10.3390/

app12052398

Academic Editors: Keun Ho Ryu

and Nipon Theera-Umpon

Received: 14 December 2021

Accepted: 21 February 2022

Published: 25 February 2022

Publisher’s Note: MDPI stays neutral

with regard to jurisdictional claims in

published maps and institutional afﬁl-

iations.

Licensee MDPI, Basel, Switzerland.

This article is an open access article

distributed under the terms and

conditions of the Creative Commons

Attribution (CC BY) license (https://

creativecommons.org/licenses/by/

4.0/).

applied

sciences

Article

A Novel Pathological Voice Identiﬁcation Technique through

Simulated Cochlear Implant Processing Systems

Rumana Islam

* , Esam Abdel-Raheem

and Mohammed Tarique

Department of ECE, University of Windsor, Windsor, ON N9B 3P4, Canada; eraheem@uwindsor.ca

Department of ECE, University of Science and Technology of Fujairah (USTF),

Fujairah P.O. Box 2202, United Arab Emirates; m.tarique@ustf.ac.ae

* Correspondence: islamq@uwindsor.ca; Tel.: +1-(519)-903-8834

Abstract:

This paper presents a pathological voice identiﬁcation system employing signal processing

techniques through cochlear implant models. The fundamentals of the biological process for speech

perception are investigated to develop this technique. Two cochlear implant models are considered

in this work: one uses a conventional bank of bandpass ﬁlters, and the other one uses a bank of

optimized gammatone ﬁlters. The critical center frequencies of those ﬁlters are selected to mimic

the human cochlear vibration patterns caused by audio signals. The proposed system processes

the speech samples and applies a CNN for ﬁnal pathological voice identiﬁcation. The results show

that the two proposed models adopting bandpass and gammatone ﬁlterbanks can discriminate the

pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with

speech samples. The obtained results of this work are also compared with those of other related

published works.

Keywords:

bandpass; cochlear implants; classifier; deep learning; filterbank; gammatone; voice pathology

1. Introduction

Humans use speech to convey information in their daily life. A human speaker encodes

information into a continuously time-varying waveform that can be stored, manipulated,

and transmitted during speech production. Finally, the message is decoded by a listener.

The whole human communication process can be broadly divided into four main parts:

speech production, auditory feedback, sound wave transmission, and speech perception [

As illustrated in Figure 1, the human voice generation system consists of the lungs,

larynx, and vocal tracts. The speech production process originates from the lungs. During

the speech production process, humans inhale air and then expel it. The most critical com-

ponents of the human voice generation system are the vocal folds. The larynx controls the

vocal folds by using its ligaments, cartilages, and muscles. The vocal folds ultimately open

the glottis (a slit between the vocal folds) depending on three conditions, namely breathing,

unvoiced, and voiced [

]. The lips, tongue, palate, and cheek form the articulators. The

primary function of articulators is to ﬁlter the sound emanating from the larynx to produce

a highly intricate sound.

The human peripheral auditory system consists of three parts [

]: the outer ear, middle

ear, and inner ear. The propagated sound enters the outer ear through the pinna, which

helps to localize the sound. Afterward, it travels down to the auditory canal and vibrates

the eardrum. The middle ear consists of three bones: the malleus, incus, and stapes. These

bones transport the vibration of the eardrum to the inner ear. The middle ear is connected

to the inner ear by an oval window. The main component of the inner ear is the cochlear,

which is a coiled tube with a snail type of shape and is ﬁlled with ﬂuid. A basilar membrane

exists within the cochlear ﬂuid, which is held to the cochlear with a bone. The vibration

of the eardrum causes a movement of the oval window to generate a compressed sound

Appl. Sci. 2022, 12, 2398. https://doi.org/10.3390/app12052398 https://www.mdpi.com/journal/applsci

资源描述：

当前文档最多预览五页，下载文档查看全文

侵权申诉



1 1 2 3 4 5 / 21



此文档下载收益归作者所有

当前文档最多预览五页，下载文档查看全文

版权提示

温馨提示：
1. 部分包含数学公式或PPT动画的文件，查看预览时可能会显示错乱或异常，文件下载后无此问题，请放心下载。
2. 本文档由用户上传，版权归属用户，天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容，确认文档内容符合您的需求后进行下载，若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误，付费完成后未能成功下载的用户请联系客服处理。

大家都在看

近期热门

一种通过模拟人工耳蜗处理系统进行病理语音识别的新技术

最近更新

大家都在看

相关文章

相关标签