Can you use ALSA to get ‘bit-perfect’ audio playback on Linux?
While I’m not convinced that the ‘bit-perfect’ audio movement has much merit to it, I think it’s fair to ask that our computers shouldn’t fiddle around with sampled audio more than they really have to. If I have a high-definition audio file or stream from Qobuz, for example, that was mastered with a 24-bit sample size and 96,000 samples per second, and I play that file to a top-quality DAC, I’d hop to hear something pretty close to the original recording.
In the Linux world these days, most mainstream desktop Linux set-ups use the Pulse audio framework. Pulse has a bad reputation for, well, for all sorts of things, really. Among hifi enthusiasts, though, it has a bad reputation for resampling and down-sampling audio. It doesn’t do that on a whim – the job that it thinks it has to do requires manipulating the audio. So I wouldn’t expect things to be better, or easier, with newer audio servers like PipeWire.
Now, ‘Bit perfect’ isn’t really a well-defined term. Still, the kinds of things that Pulse is accused (rightly or wrongly) of doing probably disqualify it from being bit-perfect on any terms. It might not even be “bit-kind-of-OK”, which is a far bigger problem.
A solution that is often proposed to this problem – if we can’t make Pulse, et al., behave themselves – is to have our audio software write directly to the ALSA devices in the kernel. Pulse, after all, is just a wrapper around ALSA, as are JACK, PipeWire, and all the rest. If we use ALSA directly, it’s suggested, we can avoid all the alleged horrors of Pulse, and get bit-perfect audio.
But can we? In order to answer that question, even approximately, we need to take a dive into the murky world of ALSA.
ALSA is comprehensible. No, really, it is
ALSA is a set of audio drivers for the Linux kernel. The drivers make
themselves visible to the user through a bunch of pseudo-files in the
directories /dev/snd
and /proc/asound
. In
practice, so far as I know, no mainstream audio application uses these
interfaces directly. In practice, there’s no need to. Audio applications
use a library to interface to ALSA. For C/C++ programs, the usual
library is libasound
.
This library isn’t just a thin wrapper around the kernel devices:
it’s a complete audio processing framework in its own right. The
operation of the library is controlled by a bunch of configuration
files, the most fundamental of which is usually
/usr/share/alsa/alsa.conf
. This turgid, undocumented
monstrosity defines such things as how to process audio using various
plug-ins. One such plug-in is dmix
, which allows multiple
applications to write to the same audio device at the same time. You’ll
appreciate, I’m sure, that this mixing operation is difficult to do
without interfering with at least one of the audio streams, and maybe
both.
To an application that uses libalsa
, an ALSA PCM device
has a name like this:
hw:CARD=digital,DEV=0
In a lot of older documentation, you’ll see only numbers used:
hw:CARD=0,DEV=0
A ‘card’ is a particular peripheral device, attached to the computer. Cards have devices and, sometimes, sub-devices. A ‘card’ need not be a plug-in card on a motherboard: my external USB DAC is a ‘card’ in ALSA terms.
Both the names and the numbers of the various cards can be seen by
looking at the contents of /proc/asound/cards
. Here is
mine:
0 [digital ]: USB-Audio - Head Box S2 digital
Pro-Ject Head Box S2 digital at usb-0000:00:14.0-9.2, high speed
1 [HDMI ]: HDA-Intel - HDA Intel HDMI
HDA Intel HDMI at 0xf1630000 irq 37
2 [PCH ]: HDA-Intel - HDA Intel PCH
HDA Intel PCH at 0xf1634000 irq 35
Card 0, aka digital
is my external USB DAC. This was an
expensive item, and I expect it to produce top-notch sound when it is
managed properly (and it does).
Card 1 is a built-in HDMI audio device, that I’ve never used. Card 2 is the device that handles the built-in headphone/microphone socket.
In general, we prefer to name our cards these days, rather than number them, because the numbers tend to change. They will certainly change if USB devices are plugged or removed. There are some tricks you can do in the kernel to get consistent numbering but, these days, there’s really no need – just use names.
Each card has one or more devices. The devices are generally numbered, rather than named, because the numbers are consistent. The devices might correspond to specific outputs – analog, optical, coaxial, etc.
The trickiest part of the PCM name, and the one the is frequently
explained wrongly, is the hw
part. This part of the device
name describes a protocol. Sometimes you’ll see the term
‘interface’ instead, but I prefer protocol. Whatever you call it, this
term describes the way in which the ALSA library will interact with the
kernel’s audio devices. The same protocol can apply to many different
cards and devices.
You don’t have to specify a protocol. I might have formulated my device name like this:
default:CARD=0:DEV=0
This means ‘use whatever the default protocol is, for the specific
card and device’. The default will usually be determined by
configuration in alsa.conf
.
There are several protocols that are almost universally defined.
hw
is a protocol in which no format conversions of any kind
are applied. The application will have to produce output in exactly the
format the hardware driver accepts. In particular, the audio parameters
have to match in sample rate, bit width, and endianness, among other
things.
The plughw
protocol, on the other hand, is more generous
to applications. This protocol routes audio through plug-ins that can do
many audio conversions, including resampling. If you really want to
avoid your samples being tampered with, this is something to avoid.
But… you guessed it… the ‘default’ protocol almost certainly includes
these conversions. It probably also includes the dmix
plug-in. Why?
Looking at device capabilities
To the best of my knowledge, there’s no way to find the exact
capabilities of an audio device by looking at files on the filesystem.
You’ll need some software to do this. Some audio players can do it, but
a more general-purpose approach is the alsacap
utility,
originally by Volker Schatz, and now available on GitHub. This utility
may be in your Linux distribution’s repository but, if not, you’ll have
to build it from source. It’s very old software, and I’ve sometimes had
to fiddle with it to make it compile on some machines. In any event,
this is what it says about my USB DAC (Card 0, remember):
Card 0, ID `digital', name `Head Box S2 digital'
Device 0, ID `USB Audio', name `USB Audio', 1 subdevices (1 available)
2 channels, sampling rate 44100..768000 Hz
Sample formats: S32_LE, SPECIAL
Buffer size range from 16 to 1536000
Period size range from 8 to 768000
This device is very flexible in its sampling rate: it will handle
everything from CD sample rates upwards. But it’s a stereo device – it
won’t handle mono audio or any surround-sound formats. Most problematic
though: the only common data format is S32_LE
. That’s 32
bits per sample, in little-endian bit ordering. Why is this an issue?
Because almost no audio files or streams are in this format.
‘hw’ is a good idea, but we might not be able to use it directly
If we’re looking for bit-perfect audio, or something close to it, it seems that the ‘hw’ protocol will offer the best results. Unfortunately, on my particular set-up – and probably on yours – it won’t work without some fiddling.
For my DAC, we’re going to have to apply at least a zero-padding conversion, and maybe a bit-ordering conversion as well. That is, we’ll need to ensure that, whatever the source material, the data written to the DAC is 32 bits, little-endian. It really doesn’t matter whether we do these conversions in the audio software or in ALSA: there’s only one correct way to do them.
Don’t get me wrong: even from a ‘bit-perfect’ perspective, these conversions are harmless. The number ‘20’, for example, is not a different number from ‘020’, and the same is true in binary. And it doesn’t matter what order we read the digits out, so long as everybody agrees on the order.
The problem is that unless we use the ‘hw’ protocol, we lose
control over what ALSA is actually doing. And if we do use it, the audio
software has to know how how to cope. Simple utilities like
aplay
don’t have the necessary smarts. If I try this:
$ aplay -D hw:CARD=digital /usr/share/sounds/alsa/Front_Center.wav
I just get an error message:
Playing WAVE '/usr/share/sounds/alsa/Front_Center.wav'
: Signed 16 bit Little Endian, Rate 48000 Hz, Mono
aplay: set_params:1387: Sample format non available
Available formats:
- S32_LE
- SPECIAL
aplay: set_params:1387: Sample format non available
Available formats:
- S32_LE
- SPECIAL
I’m trying to play a 16-bit mono file into a 32-bit, stereo device.
Although the conversions are straightforward, aplay
won’t
do them, and the hardware can’t do them.
On the other hand if I do this:
$ aplay -D plughw:CARD=digital /usr/share/sounds/alsa/Front_Center.wav
or even this:
$ aplay -D default:CARD=digital /usr/share/sounds/alsa/Front_Center.wav
the audio clip plays just fine. That’s because the audio processing chain in the ALSA library does whatever conversions are necessary, to map the source to the target device. But how do we know whether it’s doing ‘harmless’ conversions, like padding, or nasty ones, like resampling or mixing?
We don’t.
We have to check the capabilities
So if we want bit-perfect audio, or close to it, we need to check the
capabilities of the device (e.g., using alsacap
), and work
out whether there is a safe transformation between the source and device
formats.
A particular case where there might not be, is where the source is an audio CD or CD rip, and the target is a device whose sample rate is fixed at 48,000 per second. These devices are pretty common in the computer audio world. CD audio is always recorded at 44,100 samples per second. Many, perhaps most, audio files and streams from on-line suppliers are in this format. Playing this CD audio on a DAC that can only do 48,000 samples per second must use a conversion. In this case, it actually matters whether the conversion is done in ALSA, or in the music player software, because there are good and bad ways of doing it. I would guess (and it is just a guess) that ALSA doesn’t use a very sophisticated approach. Why? Because anybody who really cares about audio quality to this extent will be using a specialist music player application that can do the conversion well. And, despite what the hifi snobs say, this conversion can be done well. Well enough that no human hearing will be able to tell the original from the resampled sound, anyway.
But it won’t be ‘bit-perfect’, however we do it. Bit-perfect is a technical consideration, not a subjective one.
We need decent audio hardware, regardless of the Linux set-up
In order to avoid the kinds of problems I mentioned, we need audio hardware that can cope, without conversion, with all the sample rates we need to play. It also has to be able to handle the largest sample size we will play. In practice, this means 24 bits per sample – it’s rare to encounter anything more than this in the wild.
It’s not only the audio hardware – the DAC – that has to be able to cope with these constraints: the driver in the Linux kernel has to cope as well. Oh, and the hardware has to do it without appreciable jitter (timing errors). For USB DACs, we don’t have to worry too much about any of these things – the USB audio driver built into Linux will handle whatever the hardware can handle, and USB is asynchronous and therefore immune to jitter. Audio hardware built into motherboards is a different matter. The Intel PCH integrated audio in my computer has a 16-bit DAC – or so it says. It’s probably honest – 16-bit DACs are commodity items. But if a motherboard audio device claimed more than 16-bit resolution, I think I’d be a bit sceptical.
But 24 and even 32-bit sample sizes are common in serious hifi DACs.
Some will even use multiple DACs of this size. I feel reasonably
confident that, if I play music through my USB DAC, using as the ASLA
device specification hw:CARD=digitial
, the DAC will receive
unmodified audio data. Well, except for the harmless zero-padding,
anyway.
Or am I?
We don’t really know what the ALSA driver is doing
If I use the hw
protocol with ALSA, the data is still
going through the ALSA driver in the kernel. It has to – that’s what the
driver is for. But does the driver just forward the unmodified bits to
the hardware? Well, no, as it turns out.
I know this because the volume control in alsamixer
still has an effect on the volume, even when using the hw
protocol. This volume control is implemented in the Linux kernel, not
the software I’m using to play audio. Of course, that software might
also fiddle with the audio stream, but that’s a different
matter. If I was worried about the software, I could use different
software. But I can’t really use a different kernel driver.
The fact that the volume control works means that the kernel driver is manipulating the bit stream mathematically. I’m reasonably confident that if I set the volume control to ‘100%’ (or 0dB) then the math will be benign. But without looking at the source code for the driver, I can’t really be certain.
I must point out that I’m talking about my USB DAC here. Your sound device and driver might behave differently – there’s only one way to find out, and that’s by doing the kinds of experiments I’ve described in this article.
So…?
‘Bit-perfect’ is an objective measure, not a subjective one. To be ‘bit-perfect’ means that the audio data goes straight from the file or stream to the DAC without any alteration. I’m prepared to accept that harmless modifications like bit padding and bit ordering don’t take away the ‘bit-perfect’ status. But mixing and resampling certainly do.
With the tests I’ve described above, I’m reasonably sure that my USB
DAC, with suitable player software, driven from ALSA using the
hw
protocol, probably qualifies as ‘bit-perfect’.
At least, it’s as close as makes no difference.
But there’s no magic way of doing this: there’s no ‘use-bitperfect=true’ property that you can set anywhere that will ensure you getting bit-perfect, or even good, audio transmission. You really have to understand how ALSA works, and have a good grasp of digital audio principles.
And when we move on to Pulse, PipeWire, etc., the situation becomes even more complicated; in addition to the uncertainties I’ve descried in this article, we have the additional uncertainties introduced by that software.
But let’s not get too despondent. You don’t really need bit-perfect audio transmission. With modern, high-quality audio DACs, and a bit of care, you can still get excellent audio quality. That’s true even for systems using Pulse, if you take enough care over configuration (and you’re not using a buggy version).
I grew up with cassette tape as my main source of recorded music. We’ve come a long way since then, in terms of sound quality. We shouldn’t get hung up on things that don’t really matter.