Is "high resolution" audio really worth the effort?

I'm somewhat loathe to put this post in the "snake oil" category, because there is at least some merit to high-resolution ("hi-res") audio. Whether it lives up to the extravagant claims made for it by its proponents, I'm not sure. The world of consumer audio, like that of photography, has always been built on a philosophy of "more is better", which often translates to "more expensive is better".

Note:
I'm specifically talking about the consumer market here; what audio representations might be appropriate in a recording studio is a completely different matter.

People need little encouragement to spend money, and it isn't hard to inflate the effectiveness of your product using pseudo-scientific language. The audio and photography markets are both ripe for this kind of manipulation, so I thought it would be worth digging into the claims made by the "hi-res" audio vendors. More (expensive) often is better; the trick is to identify when the expense is out of all proportion to the gains.

In general, hi-res audio recordings command a somewhat higher price than CD-quality recordings, which in turn command a higher price than lossy, MP3-type recordings. In a way, that's fair enough -- hi-res recordings require more storage (more on this later), and storage has a price. Studio equipment that is capable of hi-res recording and editing is more expensive than that which works only to CD standards -- an expense which is, of course, passed on to the consumer. Not all suppliers have different charging strategies -- eclassical.com, for example advertises (at present) a fixed price for each album, regardless of format. Quobuz charges about 50% more for its "hi-res" offerings.

However, if a studio has the capability to record in a hi-res format, it will usually do so, regardless of what format the consumer eventually buys. That means that if you're happy with CD-quality sound (or lower), you're subsidizing those people who think they want hi-res audio, even if you don't use it yourself, and even if it isn't really justified.

What hi-res audio is, and is not

For the purposes of this post, by "high resolution" I'm referring to audio that is sampled at a higher rate, or with a higher bit depth, than traditional audio CDs. I'm really talking about the quality of digital/analogue conversion. When the sampled data is stored or distributed, it might be in a compressed or uncompressed format. If compressed, the compression might be lossy or lossless. I've come across claims that recordings are "hi-res" simply because they're stored in a lossless format (like FLAC), rather than a lossy format (like MP3). To my mind, this is not a sensible use of the term: traditional CDs are recorded in an uncompressed format (which is, by its very nature, lossless). By "hi-res" I mean "better than CD", regardless of how the data is stored. Of course, it's going to be hard to get "better than CD" quality when using a lossy compression format, but it's probably not impossible. In practice, most people who care about audio quality will mostly be using uncompressed or lossless data formats, and data compression is not an issue I'll be going into here.

Photo of Astell and Kern Kann — The sign of seriousness: my Astell and Kern "portable" audio player has dual SD card slots and dual USB connections. It sounds pretty good, in all fairness -- but does it sound as good as its extravagant price suggests? By the way, I bought mine second-hand -- I've already sold all the kidneys I can spare.

How high is "high"?

My Astell and Kern audio player claims to support sources with a sample bit depth up to 32-bit, and a sample rate of up to 384kHz (that is, 384,000 samples per second, in each stereo channel). In practice, recording studios typically distribute their products (these days) with 24-bit depth, at up to 96kHz. If any studio actually produces content in a format that will fully exercise my player's digital-to-analog conversion, I've not come across it.

The capabilities of the Astell and Kern player are pretty typical for portable audio players. In fact, even smartphones are starting to advertise these capabilities. It's not that difficult to produce the hardware -- the real problem with hi-res audio is the distribution and storage. A 24-bit/96kHz recording requires 3-4 times the distribution bandwidth and storage capacity of a CD-quality recording, and perhaps 10-20 times that of a typical streaming service like Spotify.

Given the increased demand on storage and bandwidth, is the extra resolution really justified?

Sample rate

First, let's consider sample rate. If, in principle, the limit of human hearing is 20kHz, then a sampling rate of 40,000 samples per second is adequate. Engineers refer to this figure as the Nyquist frequency. Any more samples than this are wasted, because they can't be heard. It isn't an accident that the standard sampling rate for audio CDs was originally set at 44.1 kHz -- that's just a bit higher than the Nyquist frequency, with a little bit of wiggle-room.

In reality, most adult human beings do not have hearing that extends to 20kHz. At my age, my upper limit is about 12kHz. Moreover, there isn't a clean cut-off, with frequencies under 20kHz being perfectly clear, and those higher being completely inaudible. Most people's ears are most sensitive at around 3kHz, and the sensitivity gradually rolls after after that point.

On the face of it, then, you might think that there's absolutely no need to use a sample rate higher than that of a traditional CD. Unfortunately, the situation isn't quite that simple. The complication is that the Nyquist limit is theoretical. It assumes that we reconstruct an analogue signal by passing the output of a digital-to-analog converter through an ideal low-pass filter. This (ideal) filter passes all frequencies below half the Nyquist frequency, and completely blocks all frequencies above this point. This filtering process must be in the analog domain. We can define a perfect low-pass filter digitally, but that's of no help when we're trying to construct an analog signal. Real low-pass filters are made from capacitors and resistors that have certain manufacturing tolerances. The more we try to create a "perfect" filter, the more we fail, because as we increase the number of components, the tolerances get tighter, and less achievable.

So we must accept that we can create only a relatively inexact low-pass filter, whose frequency response is a smooth roll-off, not a sharp cut-off; and then we can set the sampling rate to match the filter that we can actually achieve. If we were to sample audio at (say) 192kHz, then the overall digital-to-analog conversion process will be a lot more tolerant of inexact low-pass filtering. Sampling at 44.1kHz doesn't really give us a lot of headroom, bearing in mind the kinds of analog filters we can actually build in a commercial piece of equipment (not in a laboratory). A lot of studio equipment until recently sampled at 48kHz, and this is the natural sampling frequency of a lot of computer audio equipment. As an aside, I'll point out that conversion from CD-style 44.1kHz sampling to 48kHz sampling, and vice versa is difficult and error-prone, and something we should avoid if possible. But that's a subject for another post.

So, to allow for the fact that we can't construct a perfect low-pass filter -- or even a near-perfect one -- how much faster than the Nyquist frequency should we sample? The confounding factor here is that many digital-to-analog converters can over-sample. Over-sampling is the process of doing interpolation in the digital domain, to convert (say) a 48kHz sample stream to a 96kHz sample stream. This is a relatively simple process, and easily accomplished using modern hardware. No new information is created by over-sampling -- we can't create samples that were not there in the first place. However, an over-sampled 96kHz signal is easier to low-pass-filter to 20kHz in the analog domain than a 48kHz signal is.

Given that the limit of human hearing is about 20kHz (or lower), I strongly suspect that it would be difficult to distinguish music originally sampled at 96kHz, from music sampled at 48kHz and over-sampled during digital-to-analog conversion. Nevertheless, there is a difference. The claim that 96kHz sampling offers better sound quality than CD-style sampling is one that cannot easily be refuted on theoretical grounds. Given that this is the case, it's worth asking whether sample rates over 96kHz are worthwhile. Since almost nothing in the mainstream recording industry is sampled at these higher rates -- although the capability to do so does exist -- there seems to be no way to answer that question in practice. But, since there is currently no source material with these characteristics to test with, perhaps it doesn't need to be answered at this time. Probably there is some merit to sampling at 96kHz or thereabouts, although I cannot hear the improvement myself, despite having equipment and recordings that are capable of reproducing one (if it exists).

On the subject of sampling, I do want to take this opportunity to squash a myth that continues to circulate. It is sometimes claimed that, although the human ear is not sensitive to frequencies above 20kHz, the presence of such frequencies in a recording still has some kind of indirect effect. It's true that some musical instruments produce sounds with frequencies over 20kHz, and computer-based sound generation certainly can. But, in the end, the sound that comes out of the loudspeaker or headphone is going to end up in the human ear, with all its limitations. There's simply no basis for a claim that retaining inaudible frequencies in a recording has any effect on sound quality.

Bit width

The subject of bit width is even more contentious than that of sample rate. That's because there isn't really an equivalent of the Nyquist frequency for bit width -- we simply have no idea how sensitive the human ear is to miniscule variations of sound intensity. Audio CDs notionally use 16-bit sampling, which allows sound samples to be represented as one of 65,536 distinct intensity levels (that's 2¹⁶). Can the human ear distinguish such fine gradations of eardrum pressure? Very likely it can, and very likely the rest of the audio chain won't get in its way if it wants to -- it's not at all difficult to record with this kind of precision these days.

When we get to 24-bit sampling, however, things are less clear-cut. Now we're talking about millions of distinct levels. That raises two questions: first, is the ear sensitive enough to distinguish these levels? Second, is the rest of the audio chain capable of delivering the signal that precisely?

Concerning the first question, I think the answer is: nobody knows. That's not as unfortunate as it might be, because we do know the answer to the second question, and that renders the first question redundant.

All analog signal processing is limited by noise. By this I mean the random fluctuation in voltage level that is simply unavoidable. Over the years we've reduced noise levels more and more, by clever design of equipment and meticulous studio practice. In the end, though, everything that is not stored at absolute zero temperature generates some noise. It's a basic fact of the universe -- electrons jiggle about. With the very best equipment and methodology, the total noise level of the complete recording chain (that is, from instrument to storage) is about 2 parts per million. That's the same level of uncertainty as a 21-bit sample.

In other words, any improvement in precision beyond what is achievable with a 21-bit sampling scheme will be lost in noise, and contribute nothing to the quality.

Consequently, while a case can be made for using bit depths greater than 16-bit, there's currently little justification in storing more than 21 bits or so. Computers being what they are, we'd usually round that to 24 bits (three bytes), but that doesn't mean that the additional bits are doing much for us. There's absolutely no justification in using 32-bit sampling. Of course, the recording industry already knows that and, so far as I know, no studio is recording this way.

While I'm sceptical of hi-res audio claims, I should point out that I can hear the difference between 16-bit and 24-bit recordings in blind listening tests, if I listen very hard. I can't articulate what the difference is, nor am I claiming that I always prefer the hi-res to the CD recording -- only that I can usually tell them apart.

Hi-res audio and format conversion

This is a rather technical point, but I think it's one that deserves a bit of attention. Many of us listen to audio recordings using lossy compression formats like MP3 and AAC. Even fussy listeners may at times have to resort to this -- after all, everything can play MP3. Even my watch can play MP3.

So the question arises: if we have to make a high-bitrate MP3 from an original recording, is it better to start with a CD-quality recording, or a hi-res recording? The theoretical answer is that the better the quality of the source, the better the MP3 conversion will be. Whether this effect is significant enough to merit better-the-CD sources is, at present, not at all clear. I suspect that the difference is small, but that doesn't mean that it's non-existent.

In short, the fact that many people listen to suboptimal audio formats doesn't necessarily mean that studios shouldn't strive for better quality. Further research is needed in this area.

Summary

So does hi-res audio deserve it's place in the snake oil hall of infamy? To some extent, it probably does not. There is at least some support for the notion that a modest increase in bit depth and sample rate above CD standards is audible. The use of 24-bit/96kHz sampling probably does offer a very, very small improvement in audio quality over CD -- at least when played on top-quality equipment in a quiet environment. Beyond this point, there's little or nothing to gain. The time to move to 32-bit sampling is when we're recording, and listening, in the spaces between the stars; in all practical scenarios even quantum-level effects will be larger than a single bit in a 32-bit sample.

My real concern, though, is that manufacturers are using "hi-res" claims that have only a theoretical benefit. What's the point, for example, of advertising that your smartphone has a 24-bit digital-to-analogue converter, if 12 of those bits are just noise or distortion? This is the same fallacy that is perpetuated by smartphone manufacturers who fit their devices with gazillion megapixel camera sensors, and then put them behind cheap plastic lenses. Sure, you end up with a gazillion megapixel image file, but most of those pixels are just rubbish.

There's more to be gained, I think, by using good-quality components and careful electronic design, than in the pointless pursuit of absurd sampling rates. Digital-to-analogue converters can be superior or inferior regardless of the sampling rate and bit depth. My A&K player plays everything better than my phone -- even MP3 files. That superiority isn't the result of "hi-res" audio, but a careful design backup up by fat capacitors and a hefty battery.

Unfortunately, it's easier to advertise a high sampling rate (just like a high pixel count) than it is to explain why your choice of capacitors is superior to your competitors'. If left unchecked, the drive for increased audio resolution -- which I believe is driven almost entirely by marketing considerations -- can only push up the cost of recordings, with very little gain. On balance, therefore, I feel somewhat content to put hi-res audio in the "snake oil" section. I accept, though, that it might be effective in moderation.