Kevin Boone

Sorting files and directories on a FAT-formatted USB memory stick

disk logo I recently bought a Leak CD player. It has a USB slot, and can play audio files from a FAT-formatted memory stick. The radio in my car has a similar feature. By using a folder for each music album, it's relatively easy to provide a large music collection on a single USB stick.

Both the Leak CD player and my car radio have the same limitation: they don't sort the files or folders on the memory device. This makes it very difficult to find a particular album to play and, even then, the tracks aren't always played in the right order, even if the files are named methodically (e.g., 01 foo.mp3, 02 bar.mp3...)

A simple way to overcome this limitation is to copy all the audio files and folders to a local folder first, and then copy the whole folder structure to the memory stick using a utility that can copy in alphanumerical order (e.g., rsync). However, a large memory stick can take hours to fill and, even then, it's impossible to update its contents later without breaking the ordering. What is needed is a way to sort the directories in the FAT filesystem in place, without moving any files. The method has to be able to cope with the fact that files and directories might be hierarchically nested. That is, files have to be sorted within their directories, not globally.

Basic principle

The process I describe in this article is based on a utility called YAFS, maintained by Luis Rios. YAFS can read and write the structure of a FAT filesystem as an XML file. Each entry in the XML file contains, along with the long and short names of each file or directory, an order attribute. This is a number that represents the sequence of files in the filesystem.

We can change the ordering of files and folders by changing the order attribute in the XML, and writing the XML back to the filesystem using yafs. However, while it's conceptually simple, it's actually rather awkward to rewrite just the attributes in the XML file. We can't just hack the XML about using sed and awk, because it is hierarchically structured.

My approach, therefore, is to use xsltproc with a stylesheet transformation. This process reads the XML generated by yafs, and writes a new XML file with exactly the same file/directory details, but with each directory's contents sorted by alphanumeric order, and with new order attributes.

Because the entries have been sorted, the order attribute for each file or folder is easy to derive: it is taken from the position of the entry within its parent element in the XML. XSLT has a built-in function position() that returns this sequence number. This approach is a little ugly, because the positions of the files in a folder don't start at "1" -- entries "1" and "2" are the long and short names of the folder. Still, nothing but yafs will ever see these numbers so, provided they are sequential, it doesn't matter what they actually are.

In summary, the basic principle is:

1. Read the FAT directory structure using yafs -r.

2. Sort the entries in the resulting XML file using xsltproc and a stylesheet.

3. Write the new layout from the XML file using yafs -w.

On the YAFS website there is a link to a pre-compiled Linux binary, and a reference to the source code. There's also a Windows binary, but I'm not sure how you'd run xsltproc on Windows. I guess you could use the Windows Subsystem for Linux or, on earlier systems, Cygwin. Building from source (on Linux) is not difficult, but I found that the binary worked for me.

The set-up for the procedure in this article is a little tedious. However, once it's done, the actual sorting just involves three Linux commands, which could easily be scripted. Even with a 64Gb memory stick, the sorting operation takes under a second on my system.

Preparation -- pre-compiled Linux binary

Unpack the binary distribution to any convenient directory. Then copy the executable and the XSD schema file to any convenient, permanent location. I'm using /usr/bin but any directory will be fine. The application expects to be able to find the XSD schema in the same directory as the binary. You'll have to invoke the binary using its full pathname, so it doesn't really need to be in a location in the $PATH.

$ sudo cp -p yafs/yafs /usr/bin/
$ sudo cp yafs/fat_file_system_tree.xsd /usr/bin/

Preparation -- building from source

This step is only necessary if the binary doesn't work, or you need to modify something. Unpack the source bundle into any convenient directory.

Install Xerces-C library development header 
(method depends on the distribution)

$ sudo dnf install xerces-c-devel

Compile

$ mkdir bin deps
$ make -f makefile_unix

Copy the binary and schema to /usr/bin

$ sudo cp -p bin/yafs /usr/bin/
$ sudo cp bin/fat_file_system_tree.xsd /usr/bin/

As with the binary bundle, the executable and the XSD schema need to be installed in the same directory.

Preparation -- source and binary

Cut and paste the following XSLT stylesheet to a text file, and save it as (for example) yafs.xsl

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="root">
    <root>
      <xsl:apply-templates>
        <xsl:sort select="long_name"/>
      </xsl:apply-templates>
    </root>
  </xsl:template>

  <xsl:template match="directory">
    <directory>
      <xsl:attribute name="order"> <xsl:value-of select="position()"/></xsl:attribute>
      <xsl:apply-templates>
        <xsl:sort select="long_name"/>
      </xsl:apply-templates>
    </directory>
  </xsl:template>

  <xsl:template match="file">
    <file>
      <xsl:attribute name="order"> <xsl:value-of select="position()"/></xsl:attribute>
      <xsl:apply-templates>
        <xsl:sort select="long_name"/>
      </xsl:apply-templates>      
    </file>
  </xsl:template>

</xsl:stylesheet>

Copy this file to some convenient, permanent location, such as /usr/share.

$ sudo cp yafs.xsl /usr/share 

Usage

1. Insert or connect the device containing the FAT filesystem. Get the entry in /dev/ that represents this device, perhaps by looking at the output from dmesg.

2. Ensure that the device is unmounted. Many Linux systems auto-mount removeable devices when they are connected. You can unmount it using the umount command, or from a file manager -- details depend on the distribution. In fact, I've found that the procedure works with the device mounted, but this isn't something that should be relied on.

3. Get the existing filesystem layout as an XML file. I've called it fat.xml here, but the name is arbitrary.

$ sudo /usr/bin/yafs -d /dev/sda1 -r -f fat.xml

Note that the input to yafs is a partition containing a FAT filesystem (sda1), not the block device sda.

4. Sort the entries in the XML file into alphanumerical order using the XSLT stylesheet. Save the output in a new XML file, which I've called fat_sorted.xml here.

$ xsltproc /usr/share/yafs.xsl fat.xml > fat_sorted.xml

5. Write the new layout to the device.

$ sudo /usr/bin/yafs -d /dev/sda1 -w -f fat_sorted.xml

Note that yafs must be invoked as a full pathname, so the utility can find its schema file in the same directory.

Closing remarks

It's seems a little odd to me that, in 2022, there are still media players that can't sort folders or play audio files in filename order. It's excusable, I guess, for a car radio or a pocket MP3 player, but a little surprising in premium hifi equipment.

Nevertheless, the procedure described in this article has made things a lot more convenient. It's fiddly to set up but, once prepared, sorting the memory stick after changing its contents only takes a second or two.

It would not be difficult, I think, to implement a FAT sorter that sorted by alphanumeric order. It would even be possible to set the filesystem order based on the track number in the audio metadata, so we don't have to worry about numbering the tracks. Still, the approach I've described here works well enough for my purposes, and doesn't require writing any new software.