2

To elaborate, given an audible, sinusoidal wave $A\sin(ft)$ with constant $A$ and $f$. That sounds constant to us.

Now from a few posts linked below, I understand that as stereocilia of the hair cells at the corresponding region moves, Spiral ganglion cells (SGCs) associated with these hair cells start to fire trains of neural signals with firing rate sinusoidally deviates from silent baseline rate.

Till this point, instead of something similar to power spectrograms, we just more or less have a fake "frequency axis", along with neural signal trains that are still vibration-like. I am guessing one of the following:

  1. Groups of SGCs approximately cover all phases and the sum of their neural signal trains (due to rate coding) ends up with a approximately constant firing rate (and can be used as a amplitude reference)

  2. Rate coding does not really go beyond giving a better vibrating train than individual SGC train. And cochlear nucleus and beyond are doing some hard work to convert the trains into something that sounds constant.

https://biology.stackexchange.com/questions/39729/depolarization-and-hyperpolarization-in-stereocilia-of-the-inner-ear

How do hair cells recognize frequencies?

How does the inner ear encode sound intensity?

How is tone volume encoded?

SmoothKen
  • 123
  • 3
  • 1
    Welcome and thanks for the interesting and well-researched post. - I read 2 separate Qs in your post: (1) how is sound level (or the equivalent, a pulse train with action potentials of fixed amplitude) converted to a percept of loudness, and (2) how can a continuously changing signal be perceived as having a constant loudness? Am I right that at least one of them corresponds to your enquiry? Can you clarify your Q by concluding it with an unambiguous Q? That said, Q1 is basically covered by answers in your last 2 links. – AliceD Feb 19 '23 at 11:18
  • @AliceD Yes, Q2 is my enquiry, as till now our spike rate is still a sinusoid or vibration-like object, but the amplitude we perceive is a constant object. – SmoothKen Feb 19 '23 at 16:59
  • 1
    It still seems like your question is answered by the links you provide. Phase coding only occurs for very low frequency sounds, anyways. – Bryan Krause Feb 19 '23 at 22:26
  • 1
    Lightbulbs also flash on and off some 50 or 60 times per second, but one does not perceive it – Rodrigo de Azevedo Feb 20 '23 at 12:48
  • 1
    I agree with Bryan that the answer seems to be given in the provided links. – AliceD Feb 20 '23 at 13:17
  • I thought more about this, and I think that probably this question comes from a misperception about perception. Perception isn't a faithful representation of the outside world, it's better thought of as the result of solving for the most likely model of the world given some evidence. That's why various illusions work the way they do: they are particular examples of cases where the difference between an actual stimulus and the best-fit model are in conflict. Our brain doesn't represent sounds as sinusoids, so there's no need for them to be perceived as sinusoidal. – Bryan Krause Feb 20 '23 at 15:46
  • 1
    @RodrigodeAzevedo Yeah I thought adding the computer screen as an analogy, that points to something called flicker fusion threshold, but similar to this question, no one talks where down in the stage this processing happens. – SmoothKen Feb 20 '23 at 19:45
  • @BryanKrause No, it is not a misperception about perception. The question is more like where that conversion happens. And based on your words, I am assuming my second guess is correct, and cochlear nucleus and beyond somehow turn a sinusoid representation into a constant representation? – SmoothKen Feb 20 '23 at 19:51
  • @SmoothKen I don't think that question makes much sense from the sense of perception. I'd recommend instead asking which brain areas have detectable phase-locked activity and for which frequencies. – Bryan Krause Feb 20 '23 at 19:56

1 Answers1

1

The simplest answer to the question of

How can sinusoid sound be perceived as having constant loudness?

is in the first line of your question

given an audible, sinusoidal wave $A\sin(ft)$ with constant $A$ and $f$. That sounds constant to us.

If $A$ is a constant why should the loudness vary? The answer of course is that $\sin(ft)$ is not a constant. To really answer your question, lets start by considering a different signal $A\sin(f_mt)\sin(f_ct)$ a sinusoidally amplitude modulated (SAM) tone where $f_m$ is the modulation frequency and $f_c$ is the carrier frequency. At low, but not too low, modulation rates, and carriers within the audible range, the loudness of a SAM tone varies.

To go further, we need a piece of mathematical insight. The Hilbert transform lets us decompose a signal into a slowly varying envelope and a rapidly varying fine structure. In the case of a SAM tone the envelope would be $A\sin(f_mt)$ and the fine structure would be $\sin(f_ct)$.

We have known for a long time that the auditory system is sensitive to both the temporal fine structure and the envelope. In fact, this is a key part of how cochlear implants work. While there are earlier, and more relevant, examples, Smith, Delgutte, & Oxenham (2002) is a really nice example of this duality.

The Hilbert transform is not the only way to get the envelope of a signal. You can also get the envelop by bandpass filtering, followed by half-wave rectification, followed by low pass filtering. We know the basilar membrane acts as a bandpass filter thanks to Nobel prize winning work of von Bekesky. The inner hair cells in the cochlea provide the half wave rectification. I think the field is still divided about where/how the low pass filtering happens, but there is pretty good agreement that the auditory system has a representation of the envelope.

So back to your question, to the extent that the loudness of $A\sin(ft)$ is a constant (adaptation will make the loudness decrease over time), it is because its envelope is a constant.

StrongBad
  • 2,633
  • 14
  • 27