2

I wanted to write a code to detect baby cry sound. I am using Windows as platform. Presently, I am able to get audio samples and its frequency plot(using FFT) but not sure how to proceed forward.

I wanted to ask what steps I should follow to detect the baby cry sound given its time-frequency plot.

I saw some methods such as median filter followed by HMM in speech recognition. But for simple sound detection do I need to go for such sophiticated method?

I will be very grateful if you could help me.

jojeck
  • 11,107
  • 6
  • 38
  • 74

2 Answers2

0

Since you have frequency bucket information, you essentially have a frequency histogram.

If you treat histograms as n dimensional vectors, you can use dot product to get a similarity value that goes from 0 to 1 and could tune that so that sounds that have a value greater than say 0.6 are considered a match.

For more information check this out: http://blog.demofox.org/2015/02/13/writing-a-basic-search-engine-aka-calculating-similarity-of-histograms-with-dot-product/

Alternately you could get a little more advanced and use support vector machines, which require training data of both negatives and positives to find a way to differentiate hits and misses.

Alan Wolfe
  • 553
  • 4
  • 17
  • 1
    If you want to detect when the baby is crying as opposed to sleeping, I would go with a band-pass filter in the frequency region corresponding to baby cries (near 500-5000Hz I guess) and set a threshold. Basically what Alan is proposing. Speech recognition seems overcomplicated for your use case. SR is required to figure out intelligible words, not featureless cries. –  Jul 13 '15 at 18:36
  • Thanks for the reply :). I know the frequency region corresponding to baby cries is near 500-5000Hz. But it shows lot of variation for different baby sounds. So how to proceed. Also what do you mean by "set a threshold"? – Bhavin Chowksi Jul 14 '15 at 16:09
  • By setting a threshold I believe he meant to basically watch for a frequency amplitude to go over a specific value. – Alan Wolfe Jul 14 '15 at 16:25
  • Have you noticed a specific "shape" of frequency amplitudes (harmonics) for crying versus other baby sounds? If so, the histogram dot product technique can help you distinguish the crying case from the other cases. If you haven't noticed any such pattern, support vector machines can be used to find any such non obvious patterns. – Alan Wolfe Jul 14 '15 at 16:27
  • Wait! Frequency amplitude should directly be proportional to the sound intensity, right? – Bhavin Chowksi Jul 14 '15 at 17:35
  • Yes, but your ears don't work linearly, that's why there is db (: you could normalize the sound samples if you wanted to focus on the shape instead of the volume, but probably want to ignore anything that is too quiet. – Alan Wolfe Jul 14 '15 at 17:38
0

This paper might have exactly what you need if you go the supervised learning route, it has a good comparison of logistical regressions versus convoluted nets

http://www.ieee.org.il/wp-content/uploads/2016/11/2016_ICSEE_paper_206.pdf

In this work, two machine-learning algorithms were pro- posed for the detection of baby cry in audio recordings: a logistic regression classifier and a more complex CNN classifier. The results show a considerable advantage of the CNN classifier compared to the logistic regression classifier. As CNNs are naturally suited for large training datasets and for multi-class classification, we plan to train a CNN classifier to detect various types of domestic sounds in addition to cry signals.

J.A.K.
  • 116
  • 1