Citation: Islam, R.; Abdel-Raheem, E.;
Tarique, M. A Novel Pathological
Voice Identification Technique
through Simulated Cochlear Implant
Processing Systems. Appl. Sci. 2022,
12, 2398. https://doi.org/10.3390/
app12052398
Academic Editors: Keun Ho Ryu
and Nipon Theera-Umpon
Received: 14 December 2021
Accepted: 21 February 2022
Published: 25 February 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
A Novel Pathological Voice Identification Technique through
Simulated Cochlear Implant Processing Systems
Rumana Islam
1,
* , Esam Abdel-Raheem
1
and Mohammed Tarique
2
1
Department of ECE, University of Windsor, Windsor, ON N9B 3P4, Canada; eraheem@uwindsor.ca
2
Department of ECE, University of Science and Technology of Fujairah (USTF),
Fujairah P.O. Box 2202, United Arab Emirates; m.tarique@ustf.ac.ae
* Correspondence: islamq@uwindsor.ca; Tel.: +1-(519)-903-8834
Abstract:
This paper presents a pathological voice identification system employing signal processing
techniques through cochlear implant models. The fundamentals of the biological process for speech
perception are investigated to develop this technique. Two cochlear implant models are considered
in this work: one uses a conventional bank of bandpass filters, and the other one uses a bank of
optimized gammatone filters. The critical center frequencies of those filters are selected to mimic
the human cochlear vibration patterns caused by audio signals. The proposed system processes
the speech samples and applies a CNN for final pathological voice identification. The results show
that the two proposed models adopting bandpass and gammatone filterbanks can discriminate the
pathological voices from healthy ones, resulting in F1 scores of 77.6% and 78.7%, respectively, with
speech samples. The obtained results of this work are also compared with those of other related
published works.
Keywords:
bandpass; cochlear implants; classifier; deep learning; filterbank; gammatone; voice pathology
1. Introduction
Humans use speech to convey information in their daily life. A human speaker encodes
information into a continuously time-varying waveform that can be stored, manipulated,
and transmitted during speech production. Finally, the message is decoded by a listener.
The whole human communication process can be broadly divided into four main parts:
speech production, auditory feedback, sound wave transmission, and speech perception [
1
].
As illustrated in Figure 1, the human voice generation system consists of the lungs,
larynx, and vocal tracts. The speech production process originates from the lungs. During
the speech production process, humans inhale air and then expel it. The most critical com-
ponents of the human voice generation system are the vocal folds. The larynx controls the
vocal folds by using its ligaments, cartilages, and muscles. The vocal folds ultimately open
the glottis (a slit between the vocal folds) depending on three conditions, namely breathing,
unvoiced, and voiced [
2
]. The lips, tongue, palate, and cheek form the articulators. The
primary function of articulators is to filter the sound emanating from the larynx to produce
a highly intricate sound.
The human peripheral auditory system consists of three parts [
3
]: the outer ear, middle
ear, and inner ear. The propagated sound enters the outer ear through the pinna, which
helps to localize the sound. Afterward, it travels down to the auditory canal and vibrates
the eardrum. The middle ear consists of three bones: the malleus, incus, and stapes. These
bones transport the vibration of the eardrum to the inner ear. The middle ear is connected
to the inner ear by an oval window. The main component of the inner ear is the cochlear,
which is a coiled tube with a snail type of shape and is filled with fluid. A basilar membrane
exists within the cochlear fluid, which is held to the cochlear with a bone. The vibration
of the eardrum causes a movement of the oval window to generate a compressed sound
Appl. Sci. 2022, 12, 2398. https://doi.org/10.3390/app12052398 https://www.mdpi.com/journal/applsci