Command-line hacking: querying an Internet radio database
This is another in my series of
articles on doing off-beat and (I hope) interesting things
with standard Linux command-line tools. In this post I'll demonstrate
how to query the database of Internet radio stations at
radio-browser.info using a bash
script.
The query (if successful) returns one or more
URLs, that can be passed to a command-line audio player like
mplayer
.
For example, to search for radio stations matching "bbc radio 4", I run the script like this:
$ radio-browser-query bbc radio 4 BBC Radio 4 Extra http://a.files.bbci.co.uk/media/live/manifesto/.../bbc_radio_four_extra.m3u8 BBC Radio 4 http://a.files.bbci.co.uk/media/live/manifesto/.../bbc_radio_fourfm.m3u8 ...
The resulting URLs can be cut-and-pasted directly to
mplayer
(and probably other audio players) to play the
radio station. Note that I've shortened the URLs in the sample
output above, because the details are not particularly interesting.
Note:
The script I describe here depends on various utilities that are installed by default on most Linux distributions, likedig
andsed
, and some that might need to be installed, likecurl
andxmllint
.
About the radio-browser.info database
There are thousands of Internet radio stations operating world-wide; the problem is finding them. Stations come and go, and even long-lived ones like those provided by the BBC change their formats and access details periodically.
Many commercial products that offer
Internet radio features use one of the proprietary databases, like
vTuner or Airable. The collaborative database at radio-browser.info
is, in my experience, as inclusive and accurate as any of the commercial
offerings, and updated more quickly. And it's free to use.
The radio-browser database provides a REST API, that is, a method
of querying the database using HTTP requests with specific URLs.
The service can return data in a variety of formats, including XML and
comma-separated values (CSV). It might be thought that CSV
would be easy to parse in a shell script but, in fact, it's surprisingly
difficult to do in a robust way. What happens, for example, if
the data fields themselves contain commas? They might be surrounded
by quotes, for example, but what happens if the data fields
contain quotes? This kind of output can be parsed using
sed
, but it's really ugly. I'd rather get the
search results in XML, and use an external tool like
xmllint
to parse them.
The radio-browser service checks at regular intervals that radio stations are still working. This is vital, given how transient they often are. However, the fact that a station responds to a simple request doesn't necessarily mean that it is broadcasting real audio. Even if it is, some stations broadcast silence for at least part of the day. It's not safe to assume that every station returned in a query will actually be available -- this is a general problem with Internet radio.
The radio-browser.info API
The format of the radio-browser REST URL that performs a search by station name is:
/xml/stations/byname/[search text]The search text can include spaces and punctuation, but this needs to be URL-encoded, that is, rendered as hexadecimal. For example, to search for "radio 4" we need:
/xml/stations/byname/radio%204because the space is character 32, or 20 in hexadecimal. The result will be an XML document with the following form:
<result> <station name="..." url_resolved="..." .../> <station name="..." url_resolved="..." .../> </result>
Each station
element contains a name, several URLs, and
a heap of other information that I haven't shown, but which might
be useful in other applications. The url_resolved
field should be the URL of the actual audio stream.
Note:
Searches for station name are case-insensitive by default.
About the script
In outline, here's what the script will do.
Randomly select one of the radio-browser.info servers to make issue the query to. The list of servers is obtained from a DNS query.
Concatenate the script's command-line arguments, which form the search expression, and encode them in URL format.
Issue the necessary HTTP request using
curl
orwget
, passing the encoded search expression in the URL.Select the relevant elements from the XML results returned by the server.
Format the results for display.
Selecting the server
The operators of radio-browser.info prefer clients to distribute requests among their servers, to balance load. For our purposes, we'll get the list of servers, and then select one at random.
The list of servers is obtained by querying the service's SRV DNS
record. We can do that using dig
:
SERVER_DNS=_api._tcp.radio-browser.info dig +short $SERVER_DNS SRV
This lookup returns the server list, one entry per line.
Then we can use shuf
to randomise the list
(which is fast when the list is short), then head -1
to select the first item in the list. This will provide
a random selection each time the script is executed.
Processing the command line
To make the script easier to use, we'll concatenate all the command-line arguments into a single string; this means that the user can run
$ radio-browser-query bbc radio 4 rather than $ radio-browser-query "bbc radio 4"This simple manipulation makes the script more convenient to use but, of course, it is only appropriate if the script doesn't take any other kind of command-line argument. Then we'll replace spaces in the string with the URL character "%20". Note that my simple script only handles spaces, and other punctuation symbols will break it. It wouldn't be difficult to extend it to handle other kinds of punctuation if necessary. So we have:
ARGS="$*" QUERY=$ARGS ENC_QUERY=$(echo $QUERY | sed -e s/\\s/%20/g)
The \s
token matches any whitespace, while the
trailing /g
applies the transformation to %20
wherever the whitespace appears in the line.
Making the request
The request URL is determined by the REST request URL, and the value of
ENC_QUERY
derived previously.
API="https://$SERVER/xml/stations/byname/$ENC_QUERY\ ?order=votes&offset=0&limit=100&hidebroken=true"
I've hard-coded a limit of 100 stations here, just in case the user
accidentally enters something that would otherwise match the
entire database. hidebroken=true
excludes from
the results stations that
are known to be broken.
Then we can make the request using curl
or wget
(the full script will use either, whichever is available).
API_RESPONSE=$(curl --silent $API)
The --silent
switch prevents progress information being
mixed up with the XML returned by the server.
Selecting and forming the XML attributes
We need the name
and url_resolved
attributes
from each station
element. We can parse the XML
by supplying an an XPath expression to xmllint
. To
select the name
attribute, for example, the XPath
expression is
result/station/@name
.
xmllint
has an --xpath
switch for
evaluating XPath expressions, and multiple expressions can be
specified in the same operation.
The result of the XPath evaluation is of the form:
name="..." url_resolved="..." name="..." url_resolved="..." ...
To form the final output, I apply a bunch of sed
and tr
operations, which are too prosaic to be
worth describing in detail.
Further work
There are all sorts of ways that the script might be improved. It's
possible to limit the results in particular ways -- by particular
music genre, or geographical location, for example. The search could
also be made exact, rather than flexible, by invoking the
bynameexact
API rather than byname
.
With this method, if you know the exact name of the station, you
could return the single URL for that station. Better still, you
could invoke an audio player directly, passing the URL, which would
make playing the station a one-command operation.
Download
Download the full script here: radio-browser-query.sh.