Blow sound recognition without using machine learning

Question

I am building an experience in Unity that will recognize when a user blows into the microphone. I have a working implementation that recognizes sounds above a certain loudness threshold, but I want to take it a step further and identify only blow sounds.

My current implementation is:

I extract FFT information from a 1-second sound clip and identify the range of frequencies involved. The idea is that the blowing sound covers a larger number of frequencies than typical speech (based on its spectrogram), so I expected its frequency range to be close to 100%. This implementation picks up other sounds and speech that have a frequency range close to 100%, which I guess is to be expected, but that's what I have so far.

As you can see in the spectrogram, it's easy to see where the blowing sound occurs. While speech appears more jagged, the blowing sound seems to have a very consistent pattern.

Research that I have done:

I tried to follow the steps in this academic research paper first. While I learned a lot about digital sound, I also realized it was relying on machine learning, and I want to avoid that.
I analyzed the spectrogram above and semi-successfully implemented sound recognition based on the frequency range for a 1-second sound clip. I used FFT information for this implementation.
I tried to treat a frequency histogram of my audio sample as an n-dimensional vector. I calculated the dot product of an audio sample and a pre-recorded blowing sound based on one of the answers from this post. The results were inconsistent, so I abandoned the idea.
I found a question that seemed the most similar for my situation. The comments helped me solidify my belief that machine learning would be an overkill for this situation. The answers seem to suggest classification algorithms as a solution, and, if I'm not mistaken, that's machine learning. (Correct me if I'm wrong though)

My question is:

Which other audio features can I extract from the audio clip/FFT to identify a blowing sound more precisely?

I can visually see the blowing sound in the spectrogram, which leads me to believe it's possible to identify it with an algorithm without using machine learning.

I am relatively new to sound processing, so any information is highly appreciated.

I recommend synchrosqueezing for inspection and scattering for classification. — OverLordGoldDragon, Apr 13 '21 at 17:21
Thank you @OverLordGoldDragon! Just to clarify, is scattering a machine learning technique? — Anya, Apr 13 '21 at 18:26
It's inspired by convolutions and greatly improves performance in absence of big data; main difference is filters are pre-designed and fixed, but we can stack learned layers (e.g. conv) on top. Helpful lecture. — OverLordGoldDragon, Apr 13 '21 at 18:40
Ah, got it. Thank you so much, I will check out the lecture. — Anya, Apr 13 '21 at 18:45
I misread the question as "with"; in the without case you can use extracted scattering features with pattern matching algorithms (if you know the pattern) - though minimally, a one-layer SVM would be simpler (and likely more effective). Regardless I'd wait for others' responses, many options here. — OverLordGoldDragon, Apr 13 '21 at 22:31
To distinguish speech from stochastic noise-like blow sounds, a basic approach like computing the spectral flatness in a blockwise manner could potentially be a promising first attempt. — applesoup, Apr 14 '21 at 21:21
@applesoup thank you, that does sound promising! I implemented spectral flatness calculation based on FFT like in this link: https://www.johndcook.com/blog/2016/05/03/spectral-flatness. However, as long as one of the values in FFT is 0 (which is always the case), the spectral flatness is always 0. Do you have any suggestions on either improving the spectral flatness calculation or using another similar feature? — Anya, Apr 14 '21 at 23:39
I'd start by excluding any zero bins from the calculation. Alternatively, you could add a very small nonzero value to those bins. — applesoup, Apr 15 '21 at 06:39
I'm wondering, however, why (at least?) one of the DFT bins is always zero. To find out more: is it always the same bin(s) that's zero? — applesoup, Apr 15 '21 at 06:42
Sounds good, I'll try that.
The reason why at least one of the FFT bins is always zero is because of the code I'm using. It apparently spits out some 0 values because of garbage collection or something. I am using the code from here - https://answers.unity.com/questions/974565/how-to-do-a-fft-in-unity.html

I actually thought that was normal until you pointed it out, so thanks for catching that! — Anya, Apr 15 '21 at 16:36

Blow sound recognition without using machine learning

0 Answers0