I am building an experience in Unity that will recognize when a user blows into the microphone. I have a working implementation that recognizes sounds above a certain loudness threshold, but I want to take it a step further and identify only blow sounds.
My current implementation is:
I extract FFT information from a 1-second sound clip and identify the range of frequencies involved. The idea is that the blowing sound covers a larger number of frequencies than typical speech (based on its spectrogram), so I expected its frequency range to be close to 100%. This implementation picks up other sounds and speech that have a frequency range close to 100%, which I guess is to be expected, but that's what I have so far.
As you can see in the spectrogram, it's easy to see where the blowing sound occurs. While speech appears more jagged, the blowing sound seems to have a very consistent pattern. 
Research that I have done:
- I tried to follow the steps in this academic research paper first. While I learned a lot about digital sound, I also realized it was relying on machine learning, and I want to avoid that.
- I analyzed the spectrogram above and semi-successfully implemented sound recognition based on the frequency range for a 1-second sound clip. I used FFT information for this implementation.
- I tried to treat a frequency histogram of my audio sample as an n-dimensional vector. I calculated the dot product of an audio sample and a pre-recorded blowing sound based on one of the answers from this post. The results were inconsistent, so I abandoned the idea.
- I found a question that seemed the most similar for my situation. The comments helped me solidify my belief that machine learning would be an overkill for this situation. The answers seem to suggest classification algorithms as a solution, and, if I'm not mistaken, that's machine learning. (Correct me if I'm wrong though)
My question is:
Which other audio features can I extract from the audio clip/FFT to identify a blowing sound more precisely?
I can visually see the blowing sound in the spectrogram, which leads me to believe it's possible to identify it with an algorithm without using machine learning.
I am relatively new to sound processing, so any information is highly appreciated.
The reason why at least one of the FFT bins is always zero is because of the code I'm using. It apparently spits out some 0 values because of garbage collection or something. I am using the code from here - https://answers.unity.com/questions/974565/how-to-do-a-fft-in-unity.html
I actually thought that was normal until you pointed it out, so thanks for catching that!
– Anya Apr 15 '21 at 16:36