Command-line hacking: querying an Internet radio database

display This is another in my series of articles on doing off-beat and (I hope) interesting things with standard Linux command-line tools. In this post I'll demonstrate how to query the database of Internet radio stations at radio-browser.info using a bash script. The query (if successful) returns one or more URLs, that can be passed to a command-line audio player like mplayer.

For example, to search for radio stations matching "bbc radio 4", I run the script like this:

$ radio-browser-query bbc radio 4
BBC Radio 4 Extra 
http://a.files.bbci.co.uk/media/live/manifesto/.../bbc_radio_four_extra.m3u8 

BBC Radio 4 
http://a.files.bbci.co.uk/media/live/manifesto/.../bbc_radio_fourfm.m3u8 

...

The resulting URLs can be cut-and-pasted directly to mplayer (and probably other audio players) to play the radio station. Note that I've shortened the URLs in the sample output above, because the details are not particularly interesting.

Note:
The script I describe here depends on various utilities that are installed by default on most Linux distributions, like dig and sed, and some that might need to be installed, like curl and xmllint.

About the radio-browser.info database

There are thousands of Internet radio stations operating world-wide; the problem is finding them. Stations come and go, and even long-lived ones like those provided by the BBC change their formats and access details periodically.

Many commercial products that offer Internet radio features use one of the proprietary databases, like vTuner or Airable. The collaborative database at radio-browser.info is, in my experience, as inclusive and accurate as any of the commercial offerings, and updated more quickly. And it's free to use.

The radio-browser database provides a REST API, that is, a method of querying the database using HTTP requests with specific URLs. The service can return data in a variety of formats, including XML and comma-separated values (CSV). It might be thought that CSV would be easy to parse in a shell script but, in fact, it's surprisingly difficult to do in a robust way. What happens, for example, if the data fields themselves contain commas? They might be surrounded by quotes, for example, but what happens if the data fields contain quotes? This kind of output can be parsed using sed, but it's really ugly. I'd rather get the search results in XML, and use an external tool like xmllint to parse them.

The radio-browser service checks at regular intervals that radio stations are still working. This is vital, given how transient they often are. However, the fact that a station responds to a simple request doesn't necessarily mean that it is broadcasting real audio. Even if it is, some stations broadcast silence for at least part of the day. It's not safe to assume that every station returned in a query will actually be available -- this is a general problem with Internet radio.

The radio-browser.info API

The format of the radio-browser REST URL that performs a search by station name is:

/xml/stations/byname/[search text]

The search text can include spaces and punctuation, but this needs to be URL-encoded, that is, rendered as hexadecimal. For example, to search for "radio 4" we need:

/xml/stations/byname/radio%204

because the space is character 32, or 20 in hexadecimal. The result will be an XML document with the following form:

<result>
  <station name="..." url_resolved="..." .../>
  <station name="..." url_resolved="..." .../>
</result>

Each station element contains a name, several URLs, and a heap of other information that I haven't shown, but which might be useful in other applications. The url_resolved field should be the URL of the actual audio stream.

Note:
Searches for station name are case-insensitive by default.

About the script

In outline, here's what the script will do.

Randomly select one of the radio-browser.info servers to make issue the query to. The list of servers is obtained from a DNS query.
Concatenate the script's command-line arguments, which form the search expression, and encode them in URL format.
Issue the necessary HTTP request using curl or wget, passing the encoded search expression in the URL.
Select the relevant elements from the XML results returned by the server.
Format the results for display.

Selecting the server

The operators of radio-browser.info prefer clients to distribute requests among their servers, to balance load. For our purposes, we'll get the list of servers, and then select one at random.

The list of servers is obtained by querying the service's SRV DNS record. We can do that using dig:

SERVER_DNS=_api._tcp.radio-browser.info
dig +short $SERVER_DNS SRV

This lookup returns the server list, one entry per line. Then we can use shuf to randomise the list (which is fast when the list is short), then head -1 to select the first item in the list. This will provide a random selection each time the script is executed.

Processing the command line

To make the script easier to use, we'll concatenate all the command-line arguments into a single string; this means that the user can run

$ radio-browser-query bbc radio 4

rather than

$ radio-browser-query "bbc radio 4"

This simple manipulation makes the script more convenient to use but, of course, it is only appropriate if the script doesn't take any other kind of command-line argument. Then we'll replace spaces in the string with the URL character "%20". Note that my simple script only handles spaces, and other punctuation symbols will break it. It wouldn't be difficult to extend it to handle other kinds of punctuation if necessary. So we have:

ARGS="$*"
QUERY=$ARGS
ENC_QUERY=$(echo $QUERY | sed -e s/\\s/%20/g)

The \s token matches any whitespace, while the trailing /g applies the transformation to %20 wherever the whitespace appears in the line.

Making the request

The request URL is determined by the REST request URL, and the value of ENC_QUERY derived previously.

API="https://$SERVER/xml/stations/byname/$ENC_QUERY\
  ?order=votes&offset=0&limit=100&hidebroken=true"

I've hard-coded a limit of 100 stations here, just in case the user accidentally enters something that would otherwise match the entire database. hidebroken=true excludes from the results stations that are known to be broken.

Then we can make the request using curl or wget (the full script will use either, whichever is available).

API_RESPONSE=$(curl --silent $API)

The --silent switch prevents progress information being mixed up with the XML returned by the server.

Selecting and forming the XML attributes

We need the name and url_resolved attributes from each station element. We can parse the XML by supplying an an XPath expression to xmllint. To select the name attribute, for example, the XPath expression is result/station/@name.

xmllint has an --xpath switch for evaluating XPath expressions, and multiple expressions can be specified in the same operation.

The result of the XPath evaluation is of the form:

name="..." url_resolved="..."
name="..." url_resolved="..."
...

To form the final output, I apply a bunch of sed and tr operations, which are too prosaic to be worth describing in detail.

Further work

There are all sorts of ways that the script might be improved. It's possible to limit the results in particular ways -- by particular music genre, or geographical location, for example. The search could also be made exact, rather than flexible, by invoking the bynameexact API rather than byname. With this method, if you know the exact name of the station, you could return the single URL for that station. Better still, you could invoke an audio player directly, passing the URL, which would make playing the station a one-command operation.

Download

Download the full script here: radio-browser-query.sh.