Citation: Wang, D.; Wei, Y.; Wang, Y.;
Wang, J. A Robust and Low
Computational Cost Pitch Estimation
Method. Sensors 2022, 22, 6026.
https://doi.org/10.3390/s22166026
Academic Editors: Enrico Vezzetti,
Gabriele Baronio, Domenico
Speranza, Luca Ulrich and Andrea
Luigi Guerra
Received: 5 July 2022
Accepted: 10 August 2022
Published: 12 August 2022
Publisher’s Note: MDPI stays neutral
with regard to jurisdictional claims in
published maps and institutional affil-
iations.
Copyright: © 2022 by the authors.
Licensee MDPI, Basel, Switzerland.
This article is an open access article
distributed under the terms and
conditions of the Creative Commons
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Article
A Robust and Low Computational Cost Pitch
Estimation Method
Desheng Wang
1
, Yangjie Wei
1,
* , Yi Wang
1
and Jing Wang
2
1
Key Laboratory of Intelligent Computing in Medical Image, Ministry of Education, School of Computer
Science and Engineering, Northeastern University, Shenyang 110169, China
2
School of Information Science and Engineering, Shenyang University of Technology, Shenyang 110870, China
* Correspondence: weiyangjie@cse.neu.edu.cn
Abstract:
Pitch estimation is widely used in speech and audio signal processing. However, the
current methods of modeling harmonic structure used for pitch estimation cannot always match the
harmonic distribution of actual signals. Due to the structure of vocal tract, the acoustic nature of
musical equipment, and the spectrum leakage issue, speech and audio signals’ harmonic frequencies
often slightly deviate from the integer multiple of the pitch. This paper starts with the summation
of residual harmonics (SRH) method and makes two main modifications. First, the spectral peak
position constraint of strict integer multiple is modified to allow slight deviation, which benefits
capturing harmonics. Second, a main pitch segment extension scheme with low computational cost
feature is proposed to utilize the smooth prior of pitch more efficiently. Besides, the pitch segment
extension scheme is also integrated into the SRH method’s voiced/unvoiced decision to reduce
short-term errors. Accuracy comparison experiments with ten pitch estimation methods show that
the proposed method has better overall accuracy and robustness. Time cost experiments show that
the time cost of the proposed method reduces to around 1/8 of the state-of-the-art fast NLS method
on the experimental computer.
Keywords: pitch estimation; harmonic structure; harmonic summation (HS); smooth prior
1. Introduction
Pitch is a subjective psychoacoustic phenomenon synthesized by the ear auditory
cortex system for the brain [
1
]. As a basic feature, pitch is widely used in the areas of speech
interaction [
2
–
6
], music signal processing [
7
–
11
], and medical diagnosis [
12
,
13
]. Research on
pitch estimation has been going on for decades, and estimating pitch from clean speech has
been considered a solved problem because many methods achieve high estimation accuracy
under high signal-to-noise ratio (SNR) conditions. However, the robustness of pitch
estimation under noise and reverberation conditions still needs to be improved. Drugman
and Alwan of the University of Mons, Belgium, authors of the well-known summation of
residual harmonics (SRH) pitch estimation method, emphasize that performance under
noisy conditions is the focus of research in pitch estimation over the next decade [14,15].
The robustness of pitch estimation is affected by the model accuracy of the method,
and the modeling of almost all pitch estimation methods directly or indirectly depends
on the harmonic structure since the harmonic structure is an essential feature of audio
signals. Figure 1 shows the harmonic structure of an audio signal. The spectral peak with a
frequency of 100 Hz is the pitch, and the higher spectral peaks located near integer multiples
of 100 Hz constitute the harmonic structure of the pitch. A fundamental assumption of
modeling harmonic structures used in the pitch estimation is that the harmonic components
are strictly distributed at integer multiples of the pitch [
14
,
16
–
18
]. Expressed in a formula,
this modeling method on harmonic structures is generally realized by the product of an
integer and the pitch, that is:
f
l
= l f
0
(l = 2, ...L) (1)
Sensors 2022, 22, 6026. https://doi.org/10.3390/s22166026 https://www.mdpi.com/journal/sensors