A comprehensive guide to bit perfect digital audio using Linux

Schematic drawing of the processing of digital audio throughout various components.

Among audio and music lovers who use digital audio, bit perfect audio playback is hot. Time to explore the term and its backgrounds and guide you in setting up your own system.

What’s in a name?

Often the notion of “bit perfect” is used to describe audio playback systems that don’t alter audio on purpose, like non-optimized convenience-oriented systems often do. They mostly include plug-n-play features like resampling, volume leveling, resolution changes and channel mixing, all on-the-fly.

Bit perfect audio systems shouldn’t do that; a digitally encoded 1 which is retrrieved from a file and sent from one system like a playback computer to another like a DAC, should arrive as the original 1, not 0. In the abstract realm of computers, bit perfect is as simple as that; input == output. But, in reality, as with all things, the concept or definition of perfectness is bit more complex. Likewise, the definition above does not quite cover things in our audio reality.

Bit perfect transport and storage of digital audio files

As long as the audio lives inside raw PCM file formats –like WAV and AIFF, lossless compressed file containers –like ZIP and FLAC, or DSD inside a DXD file, one may use of-the-shelve checksum algorithms and tools to easily verify that a transported, processed or stored audio file is exactly the same as its source.

To see this in action, one may conduct the experiment below. It involves the downloading, storing and compressing of a file downloaded from the internet, and comparing the processed file to the original source file. We’ll be using md5 checksums to verify that the processed files are (bit) perfect, ie exactly the same as the original source file.

First, make sure all needed programs are installed. Do this by opening a terminal screen by pressing T while holding CTRL+ALT or by searching for and starting a terminal from the menu. The following commands should work for Debian (and derivatives) and Arch:

Now we’re ready for the real thing.

  1. Create a temporary working directory and change to it:
  2. Download the upstream source flac file and the file containing md5 checksums as supplied by archive.org/the uploader by copying and pasting the following in the terminal screen and pressing ENTER:
  3. Check if the download and storage of the source flac file succeeded, ie. is bit perfect:

    The output of latest command should be:
  4. While using the -c-switch, like we did in the former example, is a great way to check a file against predefined checksums and to ensure that your download succeeded, there are other ways to make sure the files on archive.org, and the ones downloaded to your computer and mine are exactly the same. This time we’ll run md5sum without the -c switch, directly against the file that should be checked:

This shows that file transport through TCP/IP connections and storage on any local or network storage device, can –and should– be bit perfect for digital audio files.

Still not convinced (and not faint of heart)? You might want to try the following, to see what happens when one changes a single byte (ie. 8 bits) in the source wav file.

  1. Install the hex editor dhex:
  2. Decode/Unpack the digital audio file tsp2007-10-02.sp-c4.d2t02.wav from the flac container file, which we downloaded and verified in the first excercise:
  3. Make a backup of the source wav file
  4. Open the source wav file using dhex (the first time you start dhex, it will ask you to confirm keyboards keys, just do what the program asks you to do)
  5. While in dhex, with the wav file loaded, press ENTER
  6. Next, change the first byte of the header, 'R' (for RIFF) to 'Q' (for QIFF, by typing the number 51 and confirm by pressing ENTER.
  7. Save the modified file by pressing [F10].
  8. Prove that such single bit changes can’t be detected by simply listing file sizes:
  9. And finally the prove that md5sum is invaluable in such cases:

While file sizes can be the same across different files, md5 checksums are hard to fake.

Is flac bit perfect?

In the former excercises we established proof that md5 is a great way to verify the integrity of downloads (and subsequent storage and retrieval) of files. Next we’ll try how accurate or bit perfect our beloved Free Lossless Audio Coder (FLAC) is.

First we’ll extract the source wav file from the flac container, then repack it with flac and extract it once again. This way, we can see if flac does something with digital audio it packs so efficient, or, in other words, if it is really lossless.

  1. Clean up from previous step:
  2. Decode the downloaded flac file:
  3. Create the md5sum of the resulting uncompressed wav file and store in a file:
  4. Backup the original downloaded flac file:
  5. Repack the wav file creating a new flac file:
  6. Create a checksum of the resulting compressed flac file and append it to the self created checksums file:
  7. Remove the source wav file:
  8. Decode the self created flac file, thereby restoring the wav file once again:
  9. Check the authenticity of both the wav and self generated flac files:
  10. Check if the self generated flac is the same as the original downloaded flac:

As you can see from the last command, the self generated flac created above, differs from the original downloaded flac. At the same time, the wav file extracted from the original downloaded flac file is exactly the same as the one extracted from the self generated flac file.

When you regard flac for what it really is –a lossless compression container for audio files– that makes perfect sense. For instance, changing the compression level will lead to different flac files, while the contained audio file stays intact. When changing a letter inside a metadata field, although the file size stays the same, the md5 checksum will be altered.
But flac is more than a mere zip file. For instance, in each flac file, a unique digital fingerprint of the audio it contains is stored inside the flac file itself. One may view (and compare) such fingerprints, thereby verifying the integrity of the audio (but not the metadata etc.):

  1. Extract and display the fingerprint of the wav file stored inside both the downloaded and regenerated flac files; they should be the same:
  2. The good folks at archive.org sometimes offer the original fingerprints in a downloadable file which you can use to visually establish the integrity of the audio within flac containers. Of course, with some grep magic, it is easy to not having to trust on your eyes. First download the file containing the fingerprints:
  3. Next, use metaflac and grep to create your own fingerprint checker:
  4. And the same for our self generated flac file:

So flac is a bit perfect encoder/decoder for audio files containing PCM-audio. At the same time, two flac files containing exactly the same wav file can differ amongst each other, caused by differences in non-audio data and compression parameters. Does this mean that flac is suited for the purpose of this guide; creating a bit perfect audio chain? No, unfortunelately it isn’t.

While flac is a bit perfect encoder/decoder for digital audio, the extra decompression that takes place when playing back the audio within a flac file, adds to the total load of the playback computer, which is something we’re trying to avoid. Therefore flac should not be considered suitable for usage in a bit perfect audio playback chain, although it is a great tool for efficient and accurate archiving and transport.

The devil is in the details

This is where the difficulties and subtleties come in. In current standards based audio, although the pristine source files are stored perfectly on your NAS, and retrieved by your computer through network interfaces, switches and cables in a perfect manner, the output of a digital playback system is always AES/EBU-format (or S/PDIF, like it was called before), as long as we’re talking about PCM, like with WAV or AIFF file formats.

Even when using the highest quality standards like USB Audio Class 2 (UAC2) or IEC 60958 Type I Balanced XLR. The clock signal is always multiplexed with the audio data by the playback computer.

A computer is essentially a very complex device built upon billions of very simple on/off switches. This high speed switching influences all kinds of power related properties of the computer itself, as well as any connected device. This also applies when CPU, RAM and busses inside the computer are instructed to construct a stream of bits consisting of perfect audio data source packets and near perfect source clock signals. The resulting audio output signal is never perfect.

Regardless whether the source clock signal comes from an external femto clock generator inside your $10.000 external DAC connected to your computer using UAC2, the $10 clock on your high quality audiophile attached to the PCIe bus of your PC, or the $0.03 crystal on your computers mainboard, the resulting audio signal can and will suffer audible from system load.

How about USB Audio?

In the “USB Audio Class 1” (UAC1) standard, both the computer and the DAC are allowed to drop USB-packets, while the clock signal can only be generated by the computer itself. The timing errors induced by the cheap clock generator inside the pc and the fact that packets might get lost, means that the output signal won’t be the same as the original signal which results in audible artifacts.

UAC1 should not be considered a bit perfect standard for digital audio.

Schematic drawing of the processing of digital audio throughout various components.

But surely UAC2 in isochronous asynchronous mode is bit perfect? In a proper UAC2 chain, consisting of a computer for audio playback and an UAC2-capable DAC, each USB packet –a so called “USB Request Block” (URB)— should arrive at the other end in perfect non-altered form. Another nice feature of UAC2 is that the clock signal can be fed from an external DAC, like an out-of-this-planet $20.000 DAC with a femto clock. Both, the transport of PCM encoded audio data packets inside URBs to the DAC and the clock signal separating the USB packets to the computer should be perfect! Sadly, that isn’t the problem.

Although UAC2 facilitates the separtion of timing and audio data, UAC2 by definition needs to round the audio timing information to the nearest USB timing frame. While the URB’s themselves and therefore the audio data frames are transported perfectly, their timing is a non-perfect fit.

The bottom line is that all current standards for transferring digital audio streams from one system to another, have non-perfect synchronisation of timing information. Therefore, there is no such thing as standards based bit perfectness!

But there must be a solution, right?

With all the (power) switching going on inside the device that generates the AES/EBU stream, whether it’s a computer or a dedicated audio device, that does have an audible effect on the resulting audio output signal.

Some manufacturers started to use their own implementations of i2s instead of AES/EBU. That makes perfect sense, in that it is designed to be a discrete transport mechanism for digital audio using three parallel streams, seperating clock from audio data, so it doesn’t suffer from the muxing-issues tight to AES/EBU. There is one rather big problem with these solutions. The i2s standard is only designed to be used inside gear, like inside a DAC, CD player or smartphone. There it performs the task of discrete transport for feeding audio sources from audio generating sources –like the USB receiver inside a DAC, the pickup of a CD player or the mic in a smartphone– to its internal DAC chip. The standard does not have provisions for transporting source digital audio signals to external equipment.

That’s why now and again ones sees proprietary interfaces and protocols emerging. Most of them (mis)use HDMI as the interface, while using non-HDMI-compliant internal wiring. Some try it with CAT5e/6 network cable and interfaces, while others use BNC connectors. Other manufactures try to completely de- and reconstruct the incoming AES/EBU signal, replacing the incoming clock signal by there own high quality clock signal. I don’t like those solutions and I don’t believe them to be sustainable.

We’re depending on the audio industry to agree upon a new standard, incorperating the insights bit perfect audio lovers gathered over recent yearsfor which I would suggest the inspiring name “USB Audio Class 3 (UAC3)”.

Back to reality

Unfortunately, there are not even traces of debate on USB Audio Class 3. In the mean time, we’re left with:

  1. UAC2 as the standard of choice,
  2. using meta tagged AIFF files (whenever Musicbrainz Picard supports that)
  3. stored on a proper –preferrably dedicated– file server with a Gigabit network port, like a NAS (any proper NAS should suffice) using NFS as its high level protocol,
  4. coupled to a solid –preferrably dedicated– gigabit TCP/IP network,
  5. a designed-for-audio dedicated playback computer,
  6. and last but not least, a designed-for-audio OS and software chain, running with minimal system load.
A dedicated/isolated audio network
For less then $100/€80, one can buy a professional managed ethernet layer-2/3 8-port gigabit switch, like the HP Procurve 1810-8G v2 (J9802A). Such a device enables one to implement the most complex and robust of networks. Using VLAN’s, virtual isolated networks can be created, for example a 900Mbit dedicated network for audio and a 100Mbit network for all others duties.
A dedicated designed-for-audio computer (hardware part)
On the hardware side we’re looking for a fanless, diskless and headless industrial grade PC with two CPU cores and two network interfaces.Fanless: We aim to minimize noise and vibration.
Diskless: We don’t want spinning disks inside our box, because they cause noise, vibrations and power fluctations. The only disk inside the PC will be a small mSATA solid state disk, to store the OS and music playing software on. Those wille be loaded in memory at power up, after which everything is booted and executed from RAM. We will be storing user files, like audio files, settings and preferences, on a network storage device. This way, we’ll complete eliminate activity on the SATA-bus and controller.
Headless: We only want the PCIe/USB-busses and controllers be dealing with the handling of audio. So there will be now screens or input devices attached to our box and we have no need for resource hungry 2D/3D graphics systems and their complex and error prone drivers. Instead we’ll be controlling the OS and music playing software from native applications on other devices, like desktops, laptops, smartphones or tablets, or, if desired, with any webbrowser on the local network using a webserver on the music playing PC. Of course, that would require careful implementation and resource assignement, as we would want to minimize it’s influence on audio related processes.
Industrial grade We’re looking for a system that lasts at least ten years with extensive use, without active cooling and with minimum EMI/RF radiation and vibration. We want extended lifespan components from well established suppliers. We want a well build non-vented box with as few holes as possible. We only need holes for connectors, two for ethernet, two for usb and one for power.
Two CPU cores We want a dedicated core to which we will tie all processes related to audio with realtime priority. Here processess will run like audio file retrieval from the network including the network stack itself, the usb stack and the processes needed for the music playback software. All other processes, like logging, controlling and the optional webserver, will be tied to the second core.
Two network interfaces We want a single dedicated gigabit network adapter with hardware TCP/IP offloading for retrieval of audio files on the network. All other network traffic, like control sequences, will be redirected tot the second (built-in) network interface.While the C.A.P.S. proposals are great, things can be simpler, cheaper and even better.When your on a tight budget (aren’t we all?) buy yourself a fanless industrial Intel dual core Atom based based system, like the Logic Supply AG150, configured with 2GB RAM, an idustrial 32GB msSATA drive and a best-in-class Seasonic switching power supply for around $390/€260, and you have a great starting point for this purpose.More speed means less switching, so if you can afford it, you might want to spend around double that money and buy a fanless industrial Intel dual core i5 Haswell based system, like the Logic Supply ML320, configured with 4GB RAM, a 32GB internal mSATA drive and a best-in-class Seasonic switching power supply for ~$750/€560. This system features the-best-in class NUC-design, coupling the CPU directly, so without heatsinks, to the upper side of the box. The upper part of the box is a folded sandwich construction of thick aluminium and thinner iron, which is great because it not only keeps the cores cool but minimizes RF/EMI radiation and vibration as well.You might improve on the rather good basics by replacing the switched mode adapter with a proper linear audio supply. I still haven’t come around to listen to the effect of such an upgrade, and I’am currently working with a local engineer to get such a beast built.The use of a dedicated USB PCIe card designed for audio in the PC, like the ones offered by Sotm (~$300/€350) or Paul Pang / PPA Studio (~$130) (who also offers great audio PC’s and other tweaks) did do some good in the Atom based system, but did not have any audible effect in my Core i5 system. This probably is due to the fact that I didn’t use an external linear power supply to feed the cards.

Other tweaks, like dedicated audiophile SATA-controllers and cables, do not apply to our system, as we only use our solid state mSATA disk to boot the OS and music playing software. After that everything will be executed from RAM, thereby bypassing the SATA-bus and controllers completely.

A dedicated designed-for-audio computer (software part)
As Microsoft has a long and bad track record of proprietary, hidden and non-sustainable “standards” and technology while frustrating open standards, they are not the supplier I want to attach myself to. But there are those who do and some of them have created some nice offerings, which can be divided in two categories.The first type consists of stuff that’s meant to be used like a desktop, connected to a TV and input devices or touchscreen, like JRiver Media Center (~$50/€40) and the free (as in free beer) closed source and proprietary Foobar. Of course that price is without a valid Windows (desktop) license, which those users –knowingly of course– bought as part of an OEM-installation for about $100/€100. For reasons described in this article, I don’t like all-in-one solutions like these and I’m not interested in them.The headless ones (based on Windows Server) are –as designs– more to my taste, like Audiophile Optimizer (~$100/€80) and JPlay (~$130/€100). Apart from that, you will of course need to buy a proper Windows Server license, which is an art in its own, that will set you back more than $300/€300 (just an estimation).Apart from the price, the Windows based “products” all suffer from two intrinsic problems. The first one is that Windows seized supporting USB Audio after Class 1 was defined, back in 2006. A a result, there’s no native UAC2 support in Windows, which means you have to revert to third party (and closed source) drivers, which is something I’m surely not after. The other problem is that they can only go forward by going backwards, ie. by reverse engineering. Thereby they’re battling their supplier of choice, which seems silly in my opinion. Generally these “products” consists of registry tweaks and scripts that disable standard services or tweak the system in some way. Their developers bet they can get and keep the OS, drivers and software in control that way, and hopefully the 25.000 remaning settings, proprietary drivers and their updates don’t interfere with their plans. The same applies to Apple, although the underlying OS does offer more possibilities.

On the other hand, using free and open source software one can design and build a custom dedicated OS with playback software for a single purpose; getting the AES/EBU signal from the files on the network to your external UAC2 DAC in the best possible way.

Some of my fellow enthusiasts have created some great things based on free and open software. AudioPhile Linux is in active development and uses Arch, which is fitted with a custom realtime kernel and mpd. Voyage MPD, the first audio oriented system in a single compressed image, together with Vortexbox are geared towards small and cheap embedded DiY platforms, like Beaglebone and RaspberryPi.

Mine consists of a fully automated silent installation of a heavily customized (not reverse engineered) Debian with a custom compiled kernel based of the stock backported realtime kernel. It uses stock mpd and alsa modules and libraries and achieves great results.

Related articles