Visualizing pitch detection

3 minute read Published:

Some plots to attempt to understand and visualize the intermediate results of autocorrelation-based pitch detection algorithms

I have a repo where I maintain 3 different autocorrelation-based pitch detection algorithms:

The original purpose of this repository was an every-man’s pitch detection suite. Too much pitch detection code out in the world uses very domain-specific language and doesn’t give enough usage examples. When I set out implementing these, it took a lot of sweat and trial and error to really figure out how to get from raw audio to a pitch estimation.

The algorithms roughly follow this pattern:

  1. Create a modified version of the original audio signal (autocorrelation + other transformations, e.g. normalization, variance)
  2. Estimate pitch based on the modified signal

I wanted to put together an article with some GNU Plot snapshots of these modified signals. Although the modification is similar for all 3 algorithms (ultimately they are all autocorrelation), the subsequent pitch estimation varies which is why I can’t interchange the autocorrelation among the 3 algorithms.


The source signal used for this article is an artificially generated sine wave (source code) with a frequency of 650Hz.

Real audio (instruments, voices) are not as easy to work with. The subject of part 2 will use an actual mp3 audio clip of a guitar, which presents several challenges (e.g. overtones).

Autocorrelation - time-domain vs. frequency domain

Background reading on autocorrelation.

time-vs-freq time-domain

The time domain has an ugly pattern of the signal “getting weaker”. It can benefit from being appropriately normalized. This is the time-domain autocorrelation code:

std::vector<double> acf_real {};
double acf;
for (int tau = 0; tau < size; ++tau) {
        acf = 0;
        for (int i = 0; i < size - tau; ++i) {
                acf += array[i] * array[i + tau];

By contrast, this is how the MPM time-domain autocorrelation does it:

for (tau = 0; tau < size; tau++) {
    double acf = 0;
    double divisorM = 0;
    for (int i = 0; i < size - tau; i++) {
        acf += audio_buffer[i] * audio_buffer[i + tau];
        divisorM += audio_buffer[i] * audio_buffer[i] + audio_buffer[i + tau] * audio_buffer[i + tau];
    nsdf.push_back(2 * acf / divisorM);

By diving the values by the next value this evens out the peaks. This is something I should incorporate in the future.



Interestingly, YIN doesn’t work if you substitute the MPM autocorrelation - i.e., the follow-up steps for YIN don’t seem to work with a normalized autocorrelation between (-1,1).


One day I would like to have one shared autocorrelation step among all 3 algorithms, and have them only differ by their subsequent pitch estimation step.

A way to achieve this correctness is by having all of the algorithms rely on a perfectly normalized autocorrelation. That is, to have all of the peaks be even and between -1 and 1.