Kevin Boone

Can you use ALSA to get ‘bit-perfect’ audio playback on Linux?

While I’m not convinced that the ‘bit-perfect’ audio movement has much merit to it, I think it’s fair to ask that our computers shouldn’t fiddle around with sampled audio more than they really have to. If I have a high-definition audio file or stream from Qobuz, for example, that was mastered with a 24-bit sample size and 96,000 samples per second, and I play that file to a top-quality DAC, I’d hop to hear something pretty close to the original recording.

In the Linux world these days, most mainstream desktop Linux set-ups use the Pulse audio framework. Pulse has a bad reputation for, well, for all sorts of things, really. Among hifi enthusiasts, though, it has a bad reputation for resampling and down-sampling audio. It doesn’t do that on a whim – the job that it thinks it has to do requires manipulating the audio. So I wouldn’t expect things to be better, or easier, with newer audio servers like PipeWire.

Now, ‘Bit perfect’ isn’t really a well-defined term. Still, the kinds of things that Pulse is accused (rightly or wrongly) of doing probably disqualify it from being bit-perfect on any terms. It might not even be “bit-kind-of-OK”, which is a far bigger problem.

A solution that is often proposed to this problem – if we can’t make Pulse, et al., behave themselves – is to have our audio software write directly to the ALSA devices in the kernel. Pulse, after all, is just a wrapper around ALSA, as are JACK, PipeWire, and all the rest. If we use ALSA directly, it’s suggested, we can avoid all the alleged horrors of Pulse, and get bit-perfect audio.

But can we? In order to answer that question, even approximately, we need to take a dive into the murky world of ALSA.

ALSA is comprehensible. No, really, it is

ALSA is a set of audio drivers for the Linux kernel. The drivers make themselves visible to the user through a bunch of pseudo-files in the directories /dev/snd and /proc/asound. In practice, so far as I know, no mainstream audio application uses these interfaces directly. In practice, there’s no need to. Audio applications use a library to interface to ALSA. For C/C++ programs, the usual library is libasound.

This library isn’t just a thin wrapper around the kernel devices: it’s a complete audio processing framework in its own right. The operation of the library is controlled by a bunch of configuration files, the most fundamental of which is usually /usr/share/alsa/alsa.conf. This turgid, undocumented monstrosity defines such things as how to process audio using various plug-ins. One such plug-in is dmix, which allows multiple applications to write to the same audio device at the same time. You’ll appreciate, I’m sure, that this mixing operation is difficult to do without interfering with at least one of the audio streams, and maybe both.

To an application that uses libalsa, an ALSA PCM device has a name like this:

hw:CARD=digital,DEV=0

In a lot of older documentation, you’ll see only numbers used:

hw:CARD=0,DEV=0

A ‘card’ is a particular peripheral device, attached to the computer. Cards have devices and, sometimes, sub-devices. A ‘card’ need not be a plug-in card on a motherboard: my external USB DAC is a ‘card’ in ALSA terms.

Both the names and the numbers of the various cards can be seen by looking at the contents of /proc/asound/cards. Here is mine:

0 [digital        ]: USB-Audio - Head Box S2 digital
                      Pro-Ject Head Box S2 digital at usb-0000:00:14.0-9.2, high speed
 1 [HDMI           ]: HDA-Intel - HDA Intel HDMI
                      HDA Intel HDMI at 0xf1630000 irq 37
 2 [PCH            ]: HDA-Intel - HDA Intel PCH
                      HDA Intel PCH at 0xf1634000 irq 35

Card 0, aka digital is my external USB DAC. This was an expensive item, and I expect it to produce top-notch sound when it is managed properly (and it does).

The Pro-Ject S2 DAC

Card 1 is a built-in HDMI audio device, that I’ve never used. Card 2 is the device that handles the built-in headphone/microphone socket.

In general, we prefer to name our cards these days, rather than number them, because the numbers tend to change. They will certainly change if USB devices are plugged or removed. There are some tricks you can do in the kernel to get consistent numbering but, these days, there’s really no need – just use names.

Each card has one or more devices. The devices are generally numbered, rather than named, because the numbers are consistent. The devices might correspond to specific outputs – analog, optical, coaxial, etc.

The trickiest part of the PCM name, and the one the is frequently explained wrongly, is the hw part. This part of the device name describes a protocol. Sometimes you’ll see the term ‘interface’ instead, but I prefer protocol. Whatever you call it, this term describes the way in which the ALSA library will interact with the kernel’s audio devices. The same protocol can apply to many different cards and devices.

You don’t have to specify a protocol. I might have formulated my device name like this:

default:CARD=0:DEV=0

This means ‘use whatever the default protocol is, for the specific card and device’. The default will usually be determined by configuration in alsa.conf.

There are several protocols that are almost universally defined. hw is a protocol in which no format conversions of any kind are applied. The application will have to produce output in exactly the format the hardware driver accepts. In particular, the audio parameters have to match in sample rate, bit width, and endianness, among other things.

The plughw protocol, on the other hand, is more generous to applications. This protocol routes audio through plug-ins that can do many audio conversions, including resampling. If you really want to avoid your samples being tampered with, this is something to avoid.

But… you guessed it… the ‘default’ protocol almost certainly includes these conversions. It probably also includes the dmix plug-in. Why?

Looking at device capabilities

To the best of my knowledge, there’s no way to find the exact capabilities of an audio device by looking at files on the filesystem. You’ll need some software to do this. Some audio players can do it, but a more general-purpose approach is the alsacap utility, originally by Volker Schatz, and now available on GitHub. This utility may be in your Linux distribution’s repository but, if not, you’ll have to build it from source. It’s very old software, and I’ve sometimes had to fiddle with it to make it compile on some machines. In any event, this is what it says about my USB DAC (Card 0, remember):

Card 0, ID `digital', name `Head Box S2 digital'
  Device 0, ID `USB Audio', name `USB Audio', 1 subdevices (1 available)
    2 channels, sampling rate 44100..768000 Hz
    Sample formats: S32_LE, SPECIAL
    Buffer size range from 16 to 1536000
    Period size range from 8 to 768000

This device is very flexible in its sampling rate: it will handle everything from CD sample rates upwards. But it’s a stereo device – it won’t handle mono audio or any surround-sound formats. Most problematic though: the only common data format is S32_LE. That’s 32 bits per sample, in little-endian bit ordering. Why is this an issue? Because almost no audio files or streams are in this format.

‘hw’ is a good idea, but we might not be able to use it directly

If we’re looking for bit-perfect audio, or something close to it, it seems that the ‘hw’ protocol will offer the best results. Unfortunately, on my particular set-up – and probably on yours – it won’t work without some fiddling.

For my DAC, we’re going to have to apply at least a zero-padding conversion, and maybe a bit-ordering conversion as well. That is, we’ll need to ensure that, whatever the source material, the data written to the DAC is 32 bits, little-endian. It really doesn’t matter whether we do these conversions in the audio software or in ALSA: there’s only one correct way to do them.

Don’t get me wrong: even from a ‘bit-perfect’ perspective, these conversions are harmless. The number ‘20’, for example, is not a different number from ‘020’, and the same is true in binary. And it doesn’t matter what order we read the digits out, so long as everybody agrees on the order.

The problem is that unless we use the ‘hw’ protocol, we lose control over what ALSA is actually doing. And if we do use it, the audio software has to know how how to cope. Simple utilities like aplay don’t have the necessary smarts. If I try this:

$ aplay -D hw:CARD=digital /usr/share/sounds/alsa/Front_Center.wav

I just get an error message:

Playing WAVE '/usr/share/sounds/alsa/Front_Center.wav' 
: Signed 16 bit Little Endian, Rate 48000 Hz, Mono
aplay: set_params:1387: Sample format non available
Available formats:
- S32_LE
- SPECIAL
aplay: set_params:1387: Sample format non available
Available formats:
- S32_LE
- SPECIAL

I’m trying to play a 16-bit mono file into a 32-bit, stereo device. Although the conversions are straightforward, aplay won’t do them, and the hardware can’t do them.

On the other hand if I do this:

$ aplay -D plughw:CARD=digital /usr/share/sounds/alsa/Front_Center.wav

or even this:

$ aplay -D default:CARD=digital /usr/share/sounds/alsa/Front_Center.wav

the audio clip plays just fine. That’s because the audio processing chain in the ALSA library does whatever conversions are necessary, to map the source to the target device. But how do we know whether it’s doing ‘harmless’ conversions, like padding, or nasty ones, like resampling or mixing?

We don’t.

We have to check the capabilities

So if we want bit-perfect audio, or close to it, we need to check the capabilities of the device (e.g., using alsacap), and work out whether there is a safe transformation between the source and device formats.

A particular case where there might not be, is where the source is an audio CD or CD rip, and the target is a device whose sample rate is fixed at 48,000 per second. These devices are pretty common in the computer audio world. CD audio is always recorded at 44,100 samples per second. Many, perhaps most, audio files and streams from on-line suppliers are in this format. Playing this CD audio on a DAC that can only do 48,000 samples per second must use a conversion. In this case, it actually matters whether the conversion is done in ALSA, or in the music player software, because there are good and bad ways of doing it. I would guess (and it is just a guess) that ALSA doesn’t use a very sophisticated approach. Why? Because anybody who really cares about audio quality to this extent will be using a specialist music player application that can do the conversion well. And, despite what the hifi snobs say, this conversion can be done well. Well enough that no human hearing will be able to tell the original from the resampled sound, anyway.

But it won’t be ‘bit-perfect’, however we do it. Bit-perfect is a technical consideration, not a subjective one.

We need decent audio hardware, regardless of the Linux set-up

In order to avoid the kinds of problems I mentioned, we need audio hardware that can cope, without conversion, with all the sample rates we need to play. It also has to be able to handle the largest sample size we will play. In practice, this means 24 bits per sample – it’s rare to encounter anything more than this in the wild.

It’s not only the audio hardware – the DAC – that has to be able to cope with these constraints: the driver in the Linux kernel has to cope as well. Oh, and the hardware has to do it without appreciable jitter (timing errors). For USB DACs, we don’t have to worry too much about any of these things – the USB audio driver built into Linux will handle whatever the hardware can handle, and USB is asynchronous and therefore immune to jitter. Audio hardware built into motherboards is a different matter. The Intel PCH integrated audio in my computer has a 16-bit DAC – or so it says. It’s probably honest – 16-bit DACs are commodity items. But if a motherboard audio device claimed more than 16-bit resolution, I think I’d be a bit sceptical.

But 24 and even 32-bit sample sizes are common in serious hifi DACs. Some will even use multiple DACs of this size. I feel reasonably confident that, if I play music through my USB DAC, using as the ASLA device specification hw:CARD=digitial, the DAC will receive unmodified audio data. Well, except for the harmless zero-padding, anyway.

Or am I?

We don’t really know what the ALSA driver is doing

If I use the hw protocol with ALSA, the data is still going through the ALSA driver in the kernel. It has to – that’s what the driver is for. But does the driver just forward the unmodified bits to the hardware? Well, no, as it turns out.

I know this because the volume control in alsamixer still has an effect on the volume, even when using the hw protocol. This volume control is implemented in the Linux kernel, not the software I’m using to play audio. Of course, that software might also fiddle with the audio stream, but that’s a different matter. If I was worried about the software, I could use different software. But I can’t really use a different kernel driver.

The fact that the volume control works means that the kernel driver is manipulating the bit stream mathematically. I’m reasonably confident that if I set the volume control to ‘100%’ (or 0dB) then the math will be benign. But without looking at the source code for the driver, I can’t really be certain.

I must point out that I’m talking about my USB DAC here. Your sound device and driver might behave differently – there’s only one way to find out, and that’s by doing the kinds of experiments I’ve described in this article.

So…?

‘Bit-perfect’ is an objective measure, not a subjective one. To be ‘bit-perfect’ means that the audio data goes straight from the file or stream to the DAC without any alteration. I’m prepared to accept that harmless modifications like bit padding and bit ordering don’t take away the ‘bit-perfect’ status. But mixing and resampling certainly do.

With the tests I’ve described above, I’m reasonably sure that my USB DAC, with suitable player software, driven from ALSA using the hw protocol, probably qualifies as ‘bit-perfect’. At least, it’s as close as makes no difference.

But there’s no magic way of doing this: there’s no ‘use-bitperfect=true’ property that you can set anywhere that will ensure you getting bit-perfect, or even good, audio transmission. You really have to understand how ALSA works, and have a good grasp of digital audio principles.

And when we move on to Pulse, PipeWire, etc., the situation becomes even more complicated; in addition to the uncertainties I’ve descried in this article, we have the additional uncertainties introduced by that software.

But let’s not get too despondent. You don’t really need bit-perfect audio transmission. With modern, high-quality audio DACs, and a bit of care, you can still get excellent audio quality. That’s true even for systems using Pulse, if you take enough care over configuration (and you’re not using a buggy version).

I grew up with cassette tape as my main source of recorded music. We’ve come a long way since then, in terms of sound quality. We shouldn’t get hung up on things that don’t really matter.