How the cochlea computes (2024)

Posted by izhak 10/30/2025

How the cochlea computes (2024)(www.dissonances.blog)

484 points | 150 comments

antognini 10/30/2025|

If you want to get really deep into this, Richard Lyon has spent decades developing the CARFAC model of human hearing: Cascade of Asymmetric Resonators with Fast-Acting Compression. As far as I know it's the most accurate digital model of human hearing.

He has a PDF of his book about human hearing on his website: https://dicklyon.com/hmh/Lyon_Hearing_book_01jan2018_smaller...

dr_dshiv 10/31/2025||

Neither this article nor this book discuss the fact that hair cells phase lock to sound pulses. While individual neurons can fire no more than 200hz, populations of neurons are capable of phase locking to frequencies up to thousands of hertz.

Because cochlear implants only rely on stimulating the places in the cochlea related to particular frequencies but do not play the actual frequencies themselves (for reasons unknown), people with cochlear implants can detect frequency differences but lose appreciation for music.

antognini 10/31/2025||

Yes this is another important difference between human auditory perception and classical signal processing algorithms. Typically when processing audio we take a Fourier transform and then throw away the phase information. Mostly the amplitude information is all you need to understand a sound, but the ear actually is capable of picking up phase information.

(I thought this was discussed at some point in Lyon's book but it's admittedly been many years since I read it, so I can't remember for sure.)

fluoridation 10/31/2025||

What does that mean, though? If you invert the sign of a waveform it sounds the same, so if it's not picking that up, what phase and relative to what does it pick up?

smolder 11/1/2025|||

This doesn't directly answer your questions, but I can share one example where our sensitivity to phase information becomes apparent: when you play the sound of a pair of hands clapping. It's a great test case for hearing when a set of speakers has phase alignment problems, which is underappreciated as a metric for speaker systems. (It's rare even for experts to put a lot of effort into uniform phase response when designing speakers. It's one of the hardest things to manage. Frequency response, harmonic distortion levels, dispersion and even aesthetics usually take priority.)

Likewise, if you mess with the phase alignment of different frequencies in a hand clap sample and play it through an otherwise phase-coherent source like ear buds or headphones, the misalignment is really obvious.

antognini 10/31/2025|||

The relative phase between the different frequencies. You're correct that the ear can't pick up a global phase change.

As an extreme example, consider a delta function: there is silence, then a brief spike, and then silence again. If you're just looking at the amplitudes of the various frequency components of this signal it is indistinguishable from white noise. The only thing that makes this signal look (and sound) different from white noise is the relative phase between the different frequency components. The ear's ability to detect these phase synchronicities helps it to pick out "peakiness" in waveforms more easily. (This is, in turn, important for understanding consonants in speech, which is extremely important for intelligibility, particularly in noisy environments.)

ddingus 10/31/2025||

Thank you! This is an excellent work. Much appreciated

shermantanktop 10/30/2025||

The thesis about human speech occupying less crowded spectrum is well aligned with a book called "The Great Animal Orchestra" (https://www.amazon.com/Great-Animal-Orchestra-Finding-Origin...).

That author details how the "dawn chorus" is composed of a vast number of species making noise, but who are able to pick out mating calls and other signals due to evolving their vocalizations into unique sonic niches.

It's quite interesting but also a bit depressing as he documents the decline in intensity of this phenomenon with habitat destruction etc.

HarHarVeryFunny 10/30/2025||

Birds have also evolved to choose when to vocalize to best be heard - doing so earlier in urban areas where later there will be more traffic noise, and later in some forest environments to avoid being drowned out by the early rising noisy insects.

HaroldBrill 10/31/2025||

My friend, through all these voluminous deafening comments, we certainly heard your loaded minimalist beautiful tune by you simply broadcasting your marvelous name!

kulahan 10/30/2025||

Probably worth mentioning that as evolutions that allow them to compete well in nature die out, ones that allow them to compete well in cities takes their place. Evolution is always a series of tradeoffs.

Maybe we don't have sonic variation, but temporal instead.

jibal 10/30/2025|||

The dying out of birds "in nature" and the adaptations to cities are largely independent as they occur in different populations.

kulahan 10/31/2025||

It's about filling open niches. City birds were an open niche for a long time. The ones who adapted to handle that better are thriving in better population numbers than those which can only survive with 13 specific types of trees.

Even still, among the populations of birds not adapting to the city, they are being forcibly adapted in other ways. If the reach is too big, they die.

This is how evolution works, and has always worked. The world shifts, and those who can handle it thrive, while those who can't, suffer. It's the reason mammals are running the planet today when it was lizards just a couple million years ago.

jibal 10/31/2025|||

I know very well how evolution works ... and your comment in no way refutes or even responds to my point about your previous comment about "trade offs". As I pointed out, there are multiple populations of birds, and there's no zero sum game such that what happens to one population determines what happens to another ... that sort of thinking is behind the "if monkeys evolved into humans then why are there still monkeys" confusion of creationists.

jibal 11/4/2025|||

> I think you might be a bit out of your depth here. You really seem to not know much about evolution.

Wow, such rude projection, coupled with bizarre strawmen and an apparent complete lack of understanding of what "zero-sum" means even after I explained the sense in which I was using it.

It seems to be a thing with them, e.g., "In what world do you think our energy needs plateau? [total misrepresentation of what their correspondent said] I'm always so surprised to see this 1970s hippie attitude making a comeback, especially since it makes less sense today than ever before."

kulahan 11/3/2025|||

In what world is it not a zero-sum game? You don't think that the human population has affected others? You are not aware of what keystone species are, like how wolves are so singularly important, they can literally force geographic changes to a region, which obviously has a massive effect on other species?

I think you might be a bit out of your depth here. You really seem to not know much about evolution.

didroe 10/31/2025|||

The problem is that evolution works on a much longer timescale than the pace of change to the environment that humans cause.

paulgerhardt 10/31/2025|||

While I understand the spirit of this comment, if you look at the fossil record you’ll see that’s objectively not true.

Roughly half of the shifts in the last 11 evolutionary periods, over the last 500 million years, were caused by changes that occurred in a-few-hours-to-a-few-thousand-years with 75%-90% species lost.

Evolution did not fail to work then.

jibal 10/31/2025|||

You are tautologically saying that massive shifts resulted from massive changes, but that doesn't contradict the statement about evolution--which is about far more than such "shifts" (not an aspect of nature but rather changes large enough for humans to perceive)--operating over long time periods. Every single instance of offspring is a "shift" from its progenitors.

Also talking about evolution failing to work is a category mistake--evolution is an ongoing process that is the inevitable result of imperfectly replicating biological mechanisms and there's no "succeed" or "fail" about it.

fluoridation 10/31/2025|||

I think GP meant "evolution without catastrophic biodiversity bottlenecks". Of course evolution will "work" as long as a single species survives.

kulahan 10/31/2025|||

Only if our explicit goal is to preserve the exact environment that was in place when humans showed up and gained enough knowledge to decide change wasn't allowed anymore

bitwize 10/30/2025|||

Life uh, finds a way.

kazinator 10/30/2025||

> A Fourier transform has no explicit temporal precision, and resembles something closer to the waveforms on the right; this is not what the filters in the cochlea look like.

Perhaps the ear does someting more vaguely analogous to a discrete Fourier transforms on samples of data, which is what we do in a lot of signal processing.

In signal processing, we take windowed samples, and do discrete transforms on these. These do give us some temporal precision.

There is a trade off there between frequency and temporal precision, analgous to the Pauli exclusion principle in quantum mechanics. The better we know a frequency, the less precisely we know the timing. Only an infinite, periodic signal has a single precise frequency (or precise set of harmonics) which are infinitely narrow blips in the frequency domain.

The continuous Fourier transform deals with periodic signals only. We transform an entire function like sin(x) over the entire domain. If that domain is interpreted as time, we are including all of eternity, so to speak from negative infinite time to positive.

HarHarVeryFunny 10/30/2025||

> There is a trade off there between frequency and temporal precision

Sure, and the FFT isn't inherently biased towards one vs the other. If you take an FFT over a long time window (narrowband spectrogram) then you get good frequency resolution at the cost of time resolution, and vice versa for a short time window (wideband spectrogram).

For speech recognition ideally you'd want to use both since they are detecting different things. TFA is saying that this is in fact what our cochlea filter bank is doing, using different types of filter at different frequency ranges - better frequency resolution at lower frequencies where the formants are (carrying articulatory information), and better time resolution at the high frequencies generated by fricatives where frequency doesn't matter but accurate onset detection is useful for detecting plosives.

xeonmc 10/30/2025|||

> analgous to the Pauli exclusion principle

Did you mean the Heisenberg Uncertainty Principle instead? Or is there actually some connection of Pauli Exlusion Principle to conjugate transforms that I was’t aware of?

kvakkefly 10/30/2025||

They are not connected afaik.

jibal 10/30/2025|||

The ear clearly doesn't operate on "samples of data", it doesn't "take windowed samples" ... there's an ongoing mechanical process.

energy123 10/30/2025||

STFT?

ducttapecrown 10/31/2025||

Yesterday there was an article about how the ear works more like a Gabor transform or a wavelet transform than a Fourier transform, both of which are Short Time Fourier Transforms, so yes!

xeonmc 10/30/2025||

Nit: It’s an unfortunate confusion of naming conventions, but Fourier Transform in the strictest sense implies an infinite “sampling” period, while the finite “sample” period counterpart would correspond to Fourier Series even though we colloquially refer to them interchangeably.

(I had put “sampling” in quotes as they’re actually “integration period” in this context of continuous time integration, though it would be less immediately evocative of the concept people are colloquially familiar with. If we actually further impose a constraint of finite temporal resolution so that it is honest-to-god “sampling” then it becomes Discrete Fourier Transform, of which the Fast Fourier Transform is one implementation of.)

It is this strict definition that the article title is rebuking, but it’s not quite what the colloquial usage loosely evokes in most people’s minds when we usually say Fourier Transform as an analysis tool.

So this article should have been comparing to Fourier Series analysis rather than Fourier Transform in the pedantic sense, albeit that’ll be a bit less provocative.

Regardless, it doesn’t at all take away from the salient points of this excellent article which are really interesting reframing of the concepts: what the ear does mechanistically is applying a temporal “weigting function” (filter) so it’s somewhere between Fourier series and Fourier transform. This article hits the nail on the head on presenting the sliding scale of conjugate domain trade offs (think: Heisenberg)

meowkit 10/30/2025||

I was a bit peeved by the title, but I think its a fair use of clickbait as the article has a lot of little details about acoustics in humans that I was unfamiliar with (i.e. a link to a primer on the the transduction implementation of cochlear cilia)

But yeah there is a strict vs colloquial collision here.

BrenBarn 10/30/2025||

Yeah, it's sort of like saying the ear doesn't do "a" Fourier transform, it does a bunch of Fourier transforms on samples of data, with a varying tradeoff between temporal and frequency resolution. But most people would still say that's doing a Fourier transform.

As the article briefly mentions, it's a tempting hypothesis that there is a relationship between the acoustic properties of human speech and the physical/neural structure of the auditory system. It's hard to get clear evidence on this but a lot of people have a hunch that there was some coevolution involved, with the ear's filter functions favoring the frequency ranges used by speech sounds.

aidenn0 10/30/2025|||

> ...it's a tempting hypothesis that there is a relationship between the acoustic properties of human speech and the physical/neural structure of the auditory system.

This seems trivially true in the sense that human speech is intelligible by humans; there are many sounds that humans cannot hear and/or distinguish, and speech does not involve those.

BrenBarn 10/31/2025||

Yes, but at the least it's a bit more than that, because the ear is more sensitive to certain frequency ranges than others, and speech sounds seem to be more clustered in those ranges.

foobarian 10/30/2025|||

This is something you quickly learn when you read the theory in the textbook, get excited, and sit down to write some code and figure out that you'll have to pick a finite buffer size. :-)

edbaskerville 10/30/2025||

To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

crazygringo 10/30/2025||

Yeah, this article feels like it's very much setting up a ridiculous strawman.

Nobody who knows anything about signal processing has ever suggested that the ear performs a Fourier transform across infinite time.

But the ear does perform something very much akin to the FFT (fast Fourier transform), turning discrete samples into intensities at frequencies -- which is, of course, what any reasonable person means when they say the ear does a Fourier transform.

This article suggests it's accomplished by something between wavelet and Gabor. Which, yes, is not exactly a Fourier transform -- but it's producing something that is about 95-99% the same in the end.

And again, nobody would ever suggest the ear was performing the exact math that the FFT does, down to the last decimal point. But these filters still work essentially the same way as the FFT in terms of how they respond to a given frequency, it's really just how they're windowed.

So if anyone just wants a simple explanation, I would say yes the ear does a Fourier transform. A discrete one with windowing.

anyfoo 10/30/2025|||

Since we're being pedantic, there is some confusion of ideas here (even though you do make a valid overall point), and the strawman may not be as ridiculous.

First, I think when you say FFT, you mean DFT. A Fourier transform is both non-discrete and infinite in time. A DTFT (discrete time fourier transform) is discrete, i.e. using samples, but infinite. A DFT (discrete fourier transform) is both finite (analyzed data has a start and an end) and discrete. An FFT is effectively an implementation of a DFT, and there is nothing indicating to me that hearing is in any way specifically related to how the FFT computes a DFT.

But more importantly, I'm not sure DFT fits at all? This is an analog, real-world physical process, so where is it discrete, i.e. how does the ear capture samples?

I think, purely based upon its "mode", what's happening is more akin to a Fourier series, which is the missing fourth category completing (FT, DTFT, DFT): Continuous (non-discrete), but finite or rather periodic in time.

But secondly, unlike Gabor transforms, wavelet transforms are specifically not just windowed Fourier anythings (whether FT/FS/DFT/DTFT). Those would commonly be called "short-time Fourier transforms" (STFT, existing again in discrete and non-discrete variants), and the article straight up mentions that they don't fit either in its footnotes.

Wavelet transforms use an entirely different shape (e.g. a haar wavelet) that is shifted and stretched for analysis, instead of windowed sinusoids over a windowed signal.

And I think those distinctions are what the article actually wanted to touch upon.

actionfromafar 10/30/2025||

Don’t neurons fire in bursts? That’s sort of discrete I guess.

anyfoo 10/31/2025|||

Even if they do (and I honestly have no idea), isn't it the frequency, i.e. the output of the basilar membrane in the ear, and not a sample in time of the actual sound wave which would correspond to a short-time frequency transform, that gets sampled here?

And the basilar membrane seems like a pretty un-discrete (in time, not in frequency) process to me. But I'm not 100% sure.

Sure, if you go small enough, you end up with discrete structures sooner or later (molecules, atoms, quantum if you go far down enough and everything breaks apart anyway), but without knowing anything, the sensitivity of this whole process still seems better modeled as continuous rather than discrete, the scale at which that happens seems just too small to me.

Balgair 10/31/2025|||

Neuro person here.

Yes, many neurons fire at discrete intervals set by their morphology. In fact, this DFT/FFT/Infinite-FT/whatever-FT is all the hell over neuroscience. Many neurons don't really 'communicate' in just a single action potential. They are mostly firing at each other all the time, and the rate of firing is what communicates information. So neuron A is always popping at neuron B, but that tone/rate of popping is what affects change/information.

Now, this is not nearly true of every single neuron-neuron interaction. Some do use a single action potential (your patella knee reflex), some communicate with hundreds of other neurons (pyramidal cells in your cerebellum), some inhibit the firing of other neurons (gap/dendrite junction/axon interactions), some transmit information in opposite ways. It's a giant mess and the exact sub system is what you have to specify to get a handle on things.

Also, you get whole brain wave activity during different periods of sleep and awake cycles. So all the neurons will sync up their firing rates in certain areas when you're dreaming or taking an SAT of something. And yes, you can influence mass cyclic firing with powerful magnets (TCMS).

For the cochlea here, these hair cells are mostly firing all the time and then when a sound/frequency that they are 'tuned' to is heard, then their firing pattern changes and that information is then transmitted toward the parietal lobes. To be clear too, there are a lot of other brain structures in the way before the info gets to a place where you can be conscious of it. Things like the medial nuclei, the trapezoidal bodies, the caleyx of Held, etc. Most of these areas are for discriminating sounds and the location of sounds in space. So like when your fan is on for a long while and you no longer hear it, that's because of the other structures.

a-dub 10/31/2025|||

going all the way out to percept, the response of the system is non-linear: https://en.wikipedia.org/wiki/Mel_scale

this is believed to come from the shape of the cochlea, which is often modeled as a filterbank that can express this non-linearity in an intuitive way.

smallnix 10/31/2025||||

I was also thinking of refractory periods with neurotransmitters. But I don't know much about this.

anyfoo 10/31/2025||

It's a good question, but as elaborated in a sibling comment, I'm not sure it even matters in this case. (Sampling frequency vs. sampling the sound wave itself.)

kragen 10/31/2025||||

I think those bursts ("action potentials") happen at continuously varying times, though.

acjohnson55 10/31/2025|||

Yes. See the volley theory of hearing: https://en.wikipedia.org/wiki/Volley_theory

waffletower 10/31/2025||||

The article does a fair job of positing that the ear provides temporal/frequency resolution along a logarithmic scale but doesn't assert clearly that this resolution is fixed with the STFT and the Gabor variant. It hints that wavelets are more akin in terms of perceptual scaling as a function of frequency but not articulately. But it is interesting that the author's thesis, how Fourier mathematics isn't appropriate for describing human perception of sound, relates human hearing to the Gabor transform which is thoroughly a derivative of discrete Fourier mathematics.

kragen 10/31/2025||

Many solutions to differential equations are thoroughly derived from the Fourier transform too, and so is Heisenberg's uncertainty principle. That doesn't mean they're the same thing.

kragen 10/31/2025|||

> turning discrete samples into intensities at frequencies

This description applies equally well to the discrete wavelet, discrete Gabor, and maybe even Hadamard transforms, which are definitely not, as you assert, "95–99% the same in the end" (how would you even measure such similarity?) So it is not something any reasonable person has ever meant by "the Fourier transform" or even "the discrete Fourier transform".

Also, you seem to be confused about what "discrete" means in the context of the Fourier transform. The ear functions in continuous time and does not take discrete samples.

a-dub 10/30/2025|||

> At high frequencies, frequency resolution is sacrificed for temporal resolution, and vice versa at low frequencies.

this is the time-frequency uncertainty principle. intuitively it can be understood by thinking about wavelength. the more stretched out the waveform is in time, the more of it you need to see in order to have a good representation of its frequency, but the more of it you see, the less precise you can be about where exactly it is.

> but it does do a time-localized frequency-domain transform akin to wavelets

maybe easier to conceive of first as an arbitrarily defined filter bank based on physiological results rather than trying to jump directly to some neatly defined set of orthogonal basis functions. additionally, orthogonal basis functions cannot, by definition, capture things like masking effects.

> A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

(4) size of the animal.

notably: some smaller creatures have supersonic vocalization and sensory capability, sometimes this is hypothesized to complement visual perception for avoiding predators, it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

Terr_ 10/30/2025||

> it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

Now I'm imagining some alien shrew with vocal-cords (or syrinx, or whatever) that runs the entire length of its body, just so that it can emit lower-frequency noises for some reason.

Y_Y 10/30/2025|||

Sounds like an antenna, if you'll accept electromagnetic noise then there are some fish that could pass for your shrew, e.g. https://en.wikipedia.org/wiki/Gymnotus

bragr 10/30/2025||||

Well without the humorous size difference, this is basically what whales and elephants do for long distance communication.

Terr_ 10/30/2025||

Was playing around with a fundamental frequency calculator [0] to associate certain sizes to hertz, then using a tone-generator [1] to get a subjective idea of what it'd sound like.

Though of course, nature has plenty of other tricks, like how Koalas can go down to ~27hz. [2]

[0] https://acousticalengineer.com/fundamental-frequency-calcula...

[1] https://www.szynalski.com/tone-generator/

[2] https://www.nature.com/articles/nature.2013.14275

fuzzfactor 10/30/2025||

How long would a Dachshund have to be for it to sound like a 60 kilo Great Dane?

taneq 10/31/2025|||

I’m not sure exactly how, but cats can emit a surprisingly low growl when they want to. Like, as deep as a large human would be able to. So there’s more going on than just linear size… And how I’m wondering what the lowest recorded pitch made by a shrew is.

matthewdgreen 10/30/2025|||

If you take this thought process even farther, specific words and phonemes should occupy specific slices of the tradeoff space. Across all languages and cultures, an immediate warning that a tiger is about to jump on you should sit in a different place than a mother comforting a baby (which, of course, it does.) Maybe that even filters down to ordinary conversational speech.

patrickthebold 10/30/2025|||

I think I might be missing something basic, but if you actually wanted to do a Fourier transform on the sound hitting your ear, wouldn't you need to wait your entire lifetime to compute it? It seems pretty clear that's not what is happening, since you can actually hear things as they happen.

bonoboTP 10/30/2025|||

Yes, for the vanilla Fourier transform you have to integrate from negative to positive infinity. But more practically you can put put a temporally finite-support window function on it, so you only analyze a part of it. Whenever you see a 2d spectrogram image in audio editing software, where the audio engineer can suppress a certain range of frequencies in a certain time period they use something like this.

It's called the short-time Fourier transform (STFT).

https://en.wikipedia.org/wiki/Short-time_Fourier_transform

kragen 10/31/2025||

Yeah. But a really annoying thing about the STFT is that its temporal resolution is independent of frequency, so you either have to have shitty temporal resolution at high frequencies or shitty frequency resolution at low ones, compared to the human ear. So in Audacity I keep having to switch back and forth between window sizes.

IshKebab 10/30/2025||||

Yes exactly. This is a classic "no cats and dogs don't actually rain from the sky" article.

Nobody who knows literally anything about signal processing thought the ear was doing a Fourier transform. Is it doing something like a STFT? Obviously yes and this article doesn't go against that.

xeonmc 10/30/2025||||

You’ll also need to have existed and started listening before the beginning of time, forever and ever. Amen.

cherryteastain 10/30/2025|||

Not really, just as we can create spectrograms [1] for a real time audio feed without having to wait for the end of the recording by binning the signal into timewise chunks.

[1] https://en.wikipedia.org/wiki/Spectrogram

IshKebab 10/30/2025||

Those use the Short-Time Fourier Transform, which is very much like what the ear does.

https://en.wikipedia.org/wiki/Short-time_Fourier_transform

anyfoo 10/30/2025||

Yes, but the article specifically says that it isn't like a short-time fourier transform either, but more like a wavelet transform, which is different yet again.

IshKebab 10/30/2025||

Barely different though. Obviously nobody is saying it's exactly a Fourier transform or a STFT. But it's very like a STFT (or a wavelet transform).

The article is pretty much "cows aren't actually spheres guys".

anyfoo 10/30/2025|||

I'd say the title is like that (and I agree with someone else's assessment of it being clickbait-y). I think the actual article does a pretty good job in distinguishing a lot of these transforms, and honing into which one matches most.

But the title instead makes it sound (pun unintended) that what the ear does is not about frequency decomposition at all.

jibal 10/31/2025||

The fourth sentence in the article is "Vibrations travel through the fluid to the basilar membrane, which remarkably performs frequency separation", with the footnote

"We call this tonotopic organization, which is a mapping from frequency to space. This type of organization also exists in the cortex for other senses in addition to audition, such as retinotopy for vision and somatotopy for touch."

So the cochlea does frequency decomposition but not by performing a FT (https://en.wikipedia.org/wiki/Fourier_transform), but rather by a biomechanical process involving numerous sensors that are sensitive to different frequency ranges ... similar to how we have different kinds (only 3, or in birds and rare humans 4) of cones in the retina that are sensitive to different frequency ranges.

The claim that the title makes it sound like what the ear does is not about frequency decomposition at all is simply false ... that's not what it says, at all.

kragen 10/31/2025|||

It's very unlike both of those, as the nice diagrams in the article explain; not only is what it is saying not obvious to you, it is apparently something you actively disbelieve.

km3r 10/30/2025|||

> one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

I wonder if these could be used to better master movies and television audio such that the dialogue is easier to hear.

kiicia 10/30/2025||

You are expecting too much, we still have no technology to do that, unless it’s about clarity of advertisement jingles /s

xeonmc 10/30/2025|||

Analogy: when you knock on doors, how do you decide what rhythm and duration to use, so that it won’t be mistaken as accidentally hitting the door?

toast0 10/30/2025||

Shave and a haircut is the only option in my knocking decision tree.

cnity 10/30/2025|||

Thanks for giving your two bits on the matter.

throwaway198846 10/30/2025|||

... What does that mean?

crazygringo 10/30/2025||

https://en.wikipedia.org/wiki/Shave_and_a_Haircut

lgas 10/30/2025|||

> It does this because the sounds processed by the ear are often localized in time.

What would it mean for a sound to not be localized in time?

hansvm 10/30/2025|||

It would look like a Fourier transform ;)

Zooming in to cartoonish levels might drive the point home a bit. Suppose you have sound waves

  |---------|---------|---------|

What is the frequency exactly 1/3 the way between the first two wave peaks? It's a nonsensical question. The frequency relates to the time delta between peaks, and looking locally at a sufficiently small region of time gives no information about that phenomenon.

Let's zoom out a bit. What's the frequency over a longer period of time, capturing a few peaks?

Well...if you know there is only one frequency then you can do some math to figure it out, but as soon as you might be describing a mix of frequencies you suddenly, again, potentially don't have enough information.

That lack of information manifests in a few ways. The exact math (Shannon's theorems?) suggests some things, but the language involved mismatches with human perception sufficiently that people get burned trying to apply it too directly. E.g., a bass beat with a bit of clock skew is very different from a bass beat as far as a careless decomposition is concerned, but it's likely not observable by a human listener.

Not being localized in time means* you look at longer horizons, considering more and more of those interactions. Instead of the beat of a 4/4 song meaning that the frequency changes at discrete intervals, it means that there's a larger, over-arching pattern capturing "the frequency distribution" of the entire song.

*Truly time-nonlocalized sound is of course impossible, so I'm giving some reasonable interpretation.

jancsika 10/30/2025||

> It's a nonsensical question.

Are you talking about a discrete signal or a continuous signal?

kragen 10/31/2025||||

The 50-cycle hum of the transformer outside your house. Tinnitus. The ≈15kHz horizontal scanning frequency whine of a CRT TV you used to be able to hear when you were a kid.

Of course, none of these are completely nonlocalized in time. Sooner or later there will be a blackout and the transformer will go silent. But it's a lot less localized than the chirp of a bird.

xeonmc 10/30/2025||||

Means that it is a broad spectrum signal.

Imagine the dissonant sound of hitting a trashcan.

Now imagine the sound of pressing down all 88 keys on a piano simultaneously.

Do they sound similar in your head?

The localization is located at where the phase of all frequency components are aligned coherently construct into a pulse, while further down in time their phases are misaligned and cancel each other out.

littlestymaar 10/30/2025|||

A continuous sinusoidal sound, I guess?

dsp_person 10/30/2025|||

Even if it is doing a wavelet transform, I still see that as made of Fourier transforms. Not sure if there's a good way to describe this.

We can make a short-time fourier transform or a wavelet transform in the same way either by:

- filterbank approach integrating signals in time

- take fourier transform of time slices, integrating in frequency

The same machinery just with different filters.

psunavy03 10/30/2025|||

Well from an evolutionary perspective, this would be unsurprising, considering any other forms of language would have been ill-fitted for purpose and died out. This is really just a flavor of the anthropic principle.

SoftTalker 10/30/2025|||

Ears evolved long before speech did. Probably in step with vocalizations however.

Sharlin 10/30/2025|||

Not sure about that; I'd guess that vibration-sensing organs first evolved to sense disturbances (in water, on seafloor, later on dry ground and in air) caused by movement, whether of a predator, prey, or a potential mate. Intentional vocalizations for signalling purposes then evolved to utilize the existing modality.

jibal 10/30/2025|||

Ears arose long before speech did. They evolved in response to changes in the environment, e.g., the existence of speech.

FarmerPotato 10/30/2025|||

Is that an human understanding or is it just an AI that read the text and ignored the pictures?

Why do we need a summary in a post that adds nothing new to the conversation?

pests 10/30/2025||

Are you saying your parent post was an AI summary? There is original speculation at the end and it didn’t come off that way to me.

AreYouElite 10/30/2025||

Do you believe it might be possible that the frequency band of human speech is not determined by such factors at all but more of a function of height? kids have higher voices adults have deeper voices. Similar to stringed instruments: viola high pitched and bass low pitched.

I'm no expert in these matters just speculating...

fwip 10/30/2025||

It's not height, but vocal cord length and thickness. Longer vocal cords (induced by testosterone during puberty) vibrate more slowly, with a lower frequency/pitch.

rattan12138 10/31/2025||

Wow, this discussion about how our ears work is mind-blowing! It's amazing how complex sound processing is, and the comparison to signal processing concepts is really illuminating.

tim333 10/30/2025||

Nice to see a video for the tip links and ion channels.

I spent a while reading up on that stuff because I was trying to figure what causes my tinnitus. My best guess is if the hairs over bend, that stuff can break and an ion channel get stuck open causing the cell to fire continually.

Another fun ear fact is they incorporate active amplification. You can hook an electrical signal to the loudspeaker type cell to make it vibrate around https://youtu.be/pij8a8aNpWQ

adornKey 10/30/2025||

This subject has bothered me for a long time. My question to guys into acoustics was always: If the cochlea performs some kind of Fourier transform, what are the chances, that it uses sinus waves as a base for the vector-space? - if it did anything like that it could just as good use any slightly different wave-forms as a base for transformation. Stiffness and non-linearity will for sure take care that any ideal rubber model in physics will in reality be different from the perfect sinus.

kragen 10/31/2025||

Oh, it turns out that complex exponentials are the eigenfunctions of linear time-invariant systems, and sound transmission is full of linear time-invariant systems. So surely ears cannot be perfectly detecting sinusoids, but there's a lot of evolutionary pressure to come as close as possible. That way, you can still recognize a birdsong or the howl of a wolf even if it echoes off a cliff, or recognize your baby crying even if it is facing the other way.

FarmerPotato 10/30/2025|||

I find it beautiful to see the term "sinus wave."

empiricus 10/30/2025||

well, cochlea is working withing the realm of biological and physical possibilities. basically it is a triangle through which waves are propagating, and sensors along the edge. smth smth this is similar to a filter bank of gabor filters that respond to rising freq along the triangle edge. ergo you can say fourier, but it only means sensors responding to different freq becasue of their location.

adornKey 10/30/2025||

Yeah, but not only the frequency is important - the wave-form is very relevant. For example if your wave-form is a triangle, listerners will tell you that it is very noisy compared to a simple sinus. If you use sinus as a base of your vector space triangles really look like a noisy mix. My question is, if the basic elements are really sinus, or if the basic Eigen-Waves of the cochlea are other Wave-Forms (e.g. slightly wider or narrower than sinus, ...). If physics in the ear isn't linear, maybe sinus isn't the purest wave-form for a listener.

Most people in Physics only know sinus and maybe sometimes rectangles as a base for transformations, but mathematically you could use a lot of other things - maybe very similar to sinus, but different.

kragen 10/31/2025||

But if you apply a frequency-dependent phase shift to the triangle wave, nobody will be able to tell the difference unless the frequency is very low.

Cadwhisker 10/30/2025||

Just a warning that the video ends with a loud, high pitched tone that will make you want to rip your headphones off.

Ironic for a video about hearing.

tryauuum 10/30/2025|

man I need to finally learn what a Fourier transform is

TobTobXX 10/30/2025||

3Blue1Brown has a really good explanation here: https://www.youtube.com/watch?v=spUNpyF58BY

It gave me a much better intuition than my math course.

CGMthrowaway 10/30/2025|||

It's a Copy>Paste Special>Transpose on a waveform, converting Rows/Columns that are time/amplitude (with wavelength embedded) into Rows/Columns that are frequency/amplitude (for a snapshot in time).

People love to go on about how brilliant it is and they're probably right but that's how I understand it.

TheOtherHobbes 10/30/2025||

Pretty much, but phase is also included. Which matters for some things.

bobmcnamara 10/30/2025||

But mostly not for ears it turns out!

Phase matters for some wideband signals, but most folks struggle to tell apart audio from hilbert-90-degree-shifted-audio

xeonmc 10/30/2025|||

Phase is required if it is to be a reversible transform. Otherwise would just be a Functional.

anigbrowl 10/30/2025|||

Read this (which is free): The Scientist's and Engineer's Guide to Digital Signal Processing* https://www.dspguide.com

It's very comprehensive, but it's also very well written and walks you through the mechanics of Fourier transforms in a way that makes them intuitive.

garbageman 10/30/2025|||

It's an absolutely brilliant bit of maths that breaks a complex waveform into the individual components. Kind of like taking an orchestral song and then working out each individual instrument's contribution. Learning about this left me honestly aghast and in shock that it's not only possible but that someone (Joseph Fourier) figured it out and then shared it with the world.

This video does a great job explaining what it is and how it works to the layman. 3blue1brown - https://www.youtube.com/watch?v=spUNpyF58BY

jama211 10/30/2025|||

Hahaha, I was working on learning these in second year uni… which was also exactly when I switched from an electrical engineering focussed degree to a software one!

Perhaps finally I should learn too…

adzm 10/30/2025|||

the very simplest way to describe it: it is what turns a waveform (amplitude x time) to a spectrogram like on a stereo (amplitude x frequency)

Chabsff 10/30/2025||

And phase. People always forget about the phase as if it was purely imaginary.

JKCalhoun 10/30/2025||

Ha ha, as I understand it, phase is imaginary in a Fourier transform. Complex numbers are used and the imaginary portion does indeed represent phase.

I have been told that reversing the process — creating a time-based waveform — will not resemble (visually) the original due to this phase loss in the round-tripping. But then our brain never paid phase any mind so it will sound the same to our ears. (Yay, MP3!)

xeonmc 10/30/2025|||

Actually, by the Kramers-Kronig relation you can infer the imaginary part just from the real parts, if given that your time signal is causal. So the phase isn’t actually lost in any way at all, if you assume causality.

Also, pedantic nit: phase would be the imaginary exponent of the spectrum rather than the imaginary part directly, i.e, you take the logarithm of the complex amplitude to get log-magnitude (real) plus phase (imag)

Chabsff 10/30/2025|||

I'm glad someone picked up on my dumb joke :), I was getting worried.

That being said, round-tripping works just fine, axiomatically so, until you go out of your way to discard the imaginary component.

DonHopkins 10/30/2025||

Even more complex and reflectively imaginative than the Fourier Transform is the mighty Cepstrum!

https://en.wikipedia.org/wiki/Cepstrum

It’s literally a "backwards spectrum", and the authors in 1963 were having such jolly fun they reversed the words too: quefrency => frequency, saphe => phase, alanysis => analysis, liftering => filtering

The cepstrum is the "spectrum of a log spectrum," where taking the complex logarithm turns multiplicative spectral features into additive ones, laying the foundation of cepstral alanysis, and later, the physiologically tuned Mel-frequency cepstrum used in audio compression and speech recognition.

https://en.wikipedia.org/wiki/Mel_scale

>The mel scale (after the word melody)[1] is a perceptual scale of pitches judged by listeners to be equal in distance from one another. [...] Use of the mel scale is believed to weigh the data in a way appropriate to human perception.

As Tukey might say: once you start doing cepstral alanysis, there’s no turning back, except inversely.

Skeptics said he was just going through a backwards phase, but it turned out to work! ;)

https://news.ycombinator.com/item?id=24386845

DonHopkins on Sept 5, 2020 | parent | context | favorite | on: Mathematicians should stop naming things after eac...

I love how they named the inverse spectrum the cepstrum, which uses quefrency, saphe, alanysis, and liftering, instead of frequency, phase, analysis and filtering. It should not be confused with the earlier concept of the kepstrum, of course! ;)

https://en.wikipedia.org/wiki/Cepstrum

>References to the Bogert paper, in a bibliography, are often edited incorrectly. The terms "quefrency", "alanysis", "cepstrum" and "saphe" were invented by the authors by rearranging some letters in frequency, analysis, spectrum and phase. The new invented terms are defined by analogies to the older terms.

>Thus: The name cepstrum was derived by reversing the first four letters of "spectrum". Operations on cepstra are labelled quefrency analysis (aka quefrency alanysis[1]), liftering, or cepstral analysis. It may be pronounced in the two ways given, the second having the advantage of avoiding confusion with "kepstrum", which also exists (see below). [...]

>The kepstrum, which stands for "Kolmogorov-equation power-series time response", is similar to the cepstrum and has the same relation to it as expected value has to statistical average, i.e. cepstrum is the empirically measured quantity, while kepstrum is the theoretical quantity. It was in use before the cepstrum.[12][13]

https://news.ycombinator.com/item?id=43341806

DonHopkins 7 months ago | parent | context | favorite | on: What makes code hard to read: Visual patterns of c...

Speaking of filters and clear ergonomic abstractions, if you like programming languages with keyword pairs like if/fi, for/rof, while/elihw, goto/otog, you will LOVE the cabkwards covabulary of cepstral quefrency alanysis, invented in 1963 by B. P. Bogert, M. J. Healy, and J. W. Tukey:

cepstrum: inverse spectrum

lifter: inverse filter

saphe: inverse phase

quefrency alanysis: inverse frequency analysis

gisnal orpcessing: inverse signal processing

https://en.wikipedia.org/wiki/Cepstrum

https://news.ycombinator.com/item?id=44062022

DonHopkins 5 months ago | parent | context | favorite | on: The scientific “unit” we call the decibel

At least the Mel-frequency cepstrum is honest about being a perceptual scale anchored to human hearing, rather than posing as a universally-applicable physical unit.

https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

>Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. They are derived from a type of cepstral representation of the audio clip (a nonlinear "spectrum-of-a-spectrum"). The difference between the cepstrum and the mel-frequency cepstrum is that in the MFC, the frequency bands are equally spaced on the mel scale, which approximates the human auditory system's response more closely than the linearly-spaced frequency bands used in the normal spectrum. This frequency warping can allow for better representation of sound, for example, in audio compression that might potentially reduce the transmission bandwidth and the storage requirements of audio signals.

https://en.wikipedia.org/wiki/Psychoacoustics

>Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branch of science studying the psychological responses associated with sound including noise, speech, and music. Psychoacoustics is an interdisciplinary field including psychology, acoustics, electronic engineering, physics, biology, physiology, and computer science.

dsego 10/30/2025||

humble plug https://dsego.github.io/demystifying-fourier/

More comments...