一种通过模拟人工耳蜗处理系统进行病理语音识别的新技术

ID:39022

大小:2.86 MB

页数:21页

时间:2023-03-14

金币:2

上传者:战必胜

 
Citation: Islam, R.; Abdel-Raheem, E.;
Tarique, M. A Novel Pathological
Voice Identification Technique
through Simulated Cochlear Implant
Processing Systems. Appl. Sci. 2022,
12, 2398. https://doi.org/10.3390/
app12052398
Academic Editors: Keun Ho Ryu
and Nipon Theera-Umpon
Received: 14 December 2021
Accepted: 21 February 2022
Published: 25 February 2022
Publishers Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
applied
sciences
Article
A Novel Pathological Voice Identification Technique through
Simulated Cochlear Implant Processing Systems
Rumana Islam
1,
* , Esam Abdel-Raheem
1
and Mohammed Tarique
2
1
Department of ECE, University of Windsor, Windsor, ON N9B 3P4, Canada; eraheem@uwindsor.ca
2
Department of ECE, University of Science and Technology of Fujairah (USTF),
Fujairah P.O. Box 2202, United Arab Emirates; m.tarique@ustf.ac.ae
* Correspondence: islamq@uwindsor.ca; Tel.: +1-(519)-903-8834
Abstract:
This paper presents a pathological voice identification system employing signal processing
techniques through cochlear implant models. The fundamentals of the biological process for speech
perception are investigated to develop this technique. Two cochlear implant models are considered
in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of
optimized gammatone filters. The critical center frequencies of those filters are selected to mimic
the human cochlear vibration patterns caused by audio signals. The proposed system processes
the speech samples and applies a CNN for final pathological voice identification. The results show
that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the
pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with
speech samples. The obtained results of this work are also compared with those of other related
published works.
Keywords:
bandpass; cochlear implants; classifier; deep learning; filterbank; gammatone; voice pathology
1. Introduction
Humans use speech to convey information in their daily life. A human speaker encodes
information into a continuously time-varying waveform that can be stored, manipulated,
and transmitted during speech production. Finally, the message is decoded by a listener.
The whole human communication process can be broadly divided into four main parts:
speech production, auditory feedback, sound wave transmission, and speech perception [
1
].
As illustrated in Figure 1, the human voice generation system consists of the lungs,
larynx, and vocal tracts. The speech production process originates from the lungs. During
the speech production process, humans inhale air and then expel it. The most critical com-
ponents of the human voice generation system are the vocal folds. The larynx controls the
vocal folds by using its ligaments, cartilages, and muscles. The vocal folds ultimately open
the glottis (a slit between the vocal folds) depending on three conditions, namely breathing,
unvoiced, and voiced [
2
]. The lips, tongue, palate, and cheek form the articulators. The
primary function of articulators is to filter the sound emanating from the larynx to produce
a highly intricate sound.
The human peripheral auditory system consists of three parts [
3
]: the outer ear, middle
ear, and inner ear. The propagated sound enters the outer ear through the pinna, which
helps to localize the sound. Afterward, it travels down to the auditory canal and vibrates
the eardrum. The middle ear consists of three bones: the malleus, incus, and stapes. These
bones transport the vibration of the eardrum to the inner ear. The middle ear is connected
to the inner ear by an oval window. The main component of the inner ear is the cochlear,
which is a coiled tube with a snail type of shape and is filled with fluid. A basilar membrane
exists within the cochlear fluid, which is held to the cochlear with a bone. The vibration
of the eardrum causes a movement of the oval window to generate a compressed sound
Appl. Sci. 2022, 12, 2398. https://doi.org/10.3390/app12052398 https://www.mdpi.com/journal/applsci
资源描述:

当前文档最多预览五页,下载文档查看全文

此文档下载收益归作者所有

当前文档最多预览五页,下载文档查看全文
温馨提示:
1. 部分包含数学公式或PPT动画的文件,查看预览时可能会显示错乱或异常,文件下载后无此问题,请放心下载。
2. 本文档由用户上传,版权归属用户,天天文库负责整理代发布。如果您对本文档版权有争议请及时联系客服。
3. 下载前请仔细阅读文档内容,确认文档内容符合您的需求后进行下载,若出现内容与标题不符可向本站投诉处理。
4. 下载文档时可能由于网络波动等原因无法下载或下载错误,付费完成后未能成功下载的用户请联系客服处理。
关闭