Published On: 25 Jul 2025Categories: Featured, Technical
The Paradox of Human Hearing
In audio science, one contradiction continues to provoke confusion and debate. According to conventional wisdom, the human ear cannot hear above 20 kHz. That limit has shaped how we design digital audio systems, from sampling rates to filter design and even to how we define ‘high resolution.’
And yet, a wealth of psychoacoustic and physiological research suggests a startling fact: we can perceive timing differences in sound events with sub-millisecond precision, sometimes even below 10 microseconds. That level of sensitivity corresponds to frequencies far above 20 kHz when translated through the time-frequency lens.
So how can we resolve this contradiction? The answer lies in rethinking what it means to “hear.”
Hearing in Shapes, Not Just Tones
What we perceive as sound is not merely a collection of tones at different frequencies. Real-world audio consists of rapidly evolving acoustic events. Each event, whether a piano keystroke or the snap of a twig, has a particular shape in time: a beginning, a crest, and a decay. This shape is known as the envelope of the sound.
Bob Stuart has described the envelope as the temporal contour that allows us to group sound, interpret its origin, and feel its impact. Unlike the carrier frequencies that create timbre or pitch, the envelope provides the scaffolding upon which perception builds identity, direction, and emotion.
This envelope is where timing matters. Sharp transients, steep attacks, and subtle shifts in arrival time all depend on high time resolution and that resolution is informed by frequency content well beyond 20 kHz. Even if we do not hear a 40 kHz tone as a tone, we feel its effect when it contributes to the clarity of a sound’s leading edge.
The Illusion of Inaudibility
Classical hearing tests are based on detecting sine waves. In such tests, most people cannot hear above 20 kHz. But music is not a sine wave. It’s a constantly shifting tapestry of transients, harmonic interplay, and spatial cues. When high-frequency energy shapes the envelope, it becomes perceptually meaningful, even if no single component is audible on its own.
Numerous studies support this. Research by Milind Kunchur demonstrated that listeners could detect timing differences as small as 6 microseconds. Yōichi Oohashi showed that adding ultrasonic content to music recordings triggered measurable brain activity and increased listener preference. Brian Moore and Søren Sek described how envelope-following and temporal fine structure are essential to pitch perception and spatial discrimination.
These findings don’t contradict the 20 kHz threshold, they simply show that hearing is more than tone detection. It is temporal awareness.
And here’s something often overlooked: even as we age, and our ability to hear high-frequency tones diminishes, sometimes falling below 15 kHz, our perception of music’s timing, space, and emotion often remains intact. Many audiophiles in their 60s or 70s, who may fail an audiogram, still demonstrate exquisite listening sensitivity when it comes to imaging, articulation, and dynamics. Why? Because musical meaning lives largely below 10 kHz in spectral terms, but it depends deeply on how time is shaped and preserved. Our brains remain remarkably tuned to the envelope.
So yes, you can enjoy great sound well into older age, not because you hear everything, but because you perceive what matters.
Time Smear and the Digital Chain
This brings us to one of the most underestimated issues in digital audio: time-domain distortion, or what Bob Stuart calls blur. Unlike frequency distortion or harmonic artifacts, blur is hard to measure, but easy to hear.
Filters used in A/D and D/A converters, especially sharp linear-phase FIR filters, can introduce ringing, pre-echo, and group delay. These effects smear the envelope, dull transients, and compromise spatial integrity. The damage is not dramatic, but cumulative, from the original A/D conversion, through editing, mixing, sample rate changes, encoding, and playback.
Listeners may not know why a recording sounds unnatural or fatiguing, but they feel it. The music lacks realism. The stereo image is flat. The emotional punch is missing.
Preserving the Temporal Truth
This is where MQA offers a new direction. Rather than chasing ever-higher bitrates or wider bandwidths, MQA models and corrects the temporal errors introduced at each step of the audio chain. It captures the fingerprint of the original A/D conversion, compensates for known filter behavior, and reconstructs a signal that preserves the envelope — and therefore the essence — of the musical event.
It is not about reproducing inaudible tones. It is about preserving audible shape. The goal is a signal that sounds not just clear, but coherent. Not just accurate, but natural.
Conclusion: The Shape of What We Hear
We are not spectrum analyzers. Our ears and brains are time-sensitive pattern detectors. We listen in shapes, we feel in contours, and we respond to music as something that unfolds, moment by moment.
The 20 kHz boundary, while technically valid in one sense, is irrelevant to our deeper perceptual experience. What matters more is whether the temporal structure of sound – the envelope – has survived the journey from microphone to speaker.
When it does, music becomes not just a signal, but a presence.
Peter Veth

Categories

Latest News