Using media keys in a Linux console application

display

I've been working on an embedded Linux media application that runs with a small screen in console mode. This screen only displays text, as the main user interface is on a different system; so there's no need for the overhead of a graphical display or desktop. However, the embedded system does require some keyboard interaction. I don't want (or need) a full keyboard, so I thought to use one of those cheap Windows Media Centre remote controls. These have a USB connection, and look exactly like a USB keyboard to the host computer. These little units cost only a few pounds, and have about a dozen keys. These are cursor keys, page up/down, tab, backspace, and a bunch of keys with no conventional interpretation.

Note:
This article is about the Linux kernel console -- what you get when there is no graphical interface at all. It does not apply in any way to a console application running in a terminal emulator on a graphical desktop.

These unconventional keys on the Media Centre remote control have labels like 'email' and 'web', along with a set of media transport keys (play, stop, forward, etc). It probably isn't entirely fair to describe these keys as 'unconventional', as most mainstream Linux distributions assign some sort of functions to these keys. For example, the volume up/down keys often do, in fact, raise and lower volume. The media playback keys might also do something. However, they are unconventional in that they did not exist on the kinds of dumb terminals on which Linux keyboard handling is modelled.

In a mainstream Linux desktop system, the fact that these unconventional keys do something at all is thanks to the graphical desktop. The desktop intercepts these keys, and takes some action, which may (or may not) be configurable by the user.

My application has no graphical user interface and, therefore, no graphical desktop. It uses ncurses to read the keyboard and to generate the display. The usual function in the ncurses library to read from the keyboard is getch(). This function returns a number corresponding to the key pressed. For ordinary alphanumeric keys, the value returned is simply an ASCII value. However, ncurses will interpret the terminal escape sequences generated by the console or a terminal emulator. For example, pressing the 'down' key will usually generate 'escape open-bracket B', just as dumb terminals did back in the day. The ncurses getch() functions quietly decodes all this legacy silliness, and returns a single number. The value of this number is arbitrary, but there's a constant KEY_DOWN defined in ncurses.h that applications can test for.

So, in my naivety, I assumed that the unconventional keys on the Media Centre remote control would produce some code that my application could capture and process, even if I had to find out by trial-and-error what that code was.

Boy, was I wrong.

This isn't the fault of the ncurses library, and the rest of this article has nothing to do with ncurses. The sad fact is that, in the console, these extended keys generate absolutely nothing at all. You can see this simply by going to the console and running

$ cat -v
and pressing keys until something happens. Or doesn't. If the kernel's keyboard support doesn't generate any keycode, then ncurses can't do anything, and nor can anything else.

In the end, I was able to get all the keys on the remote control to generate something that my application could use. But doing so required a fairly deep dive into the murky world of Linux console keyboard handling.

Kernel console key mapping

So what's going on here?

The raw keyboard driver generates a scan code when a key is pressed or released. The scan code is a completely arbitrary number that represents the position of the key on the keyboard. USB keyboards have standardized scan codes -- the 'Esc' key, for example, is 0x29. The letter 'A' on my keyboard has scan code '5', but that's dependent on the keyboard layout. The scan code depends on the position of the key, not its function.

Linux was originally developed for IBM PC hardware, generally using PS/2 keyboards. As a result, the kernel normalizes raw scan codes from the various keyboard devices into the scan codes that would be used by a PS/2 keyboard. On such a keyboard, the 'Esc' key has scan code 0x01, not 0x29. It is this PS/2-style scan code that is important in keyboard mapping, because most likely you won't easily be able to change what happens to the raw USB codes. It's also important to understand that, even after this first stage of translation, we're still talking about scan codes -- numbers representing where keys are positioned in space. Applications will expect the kernel to translate these codes into something they can understand. These final translated codes are usually called 'key codes', but terminology in this area is notoriously vague. Whatever we call the final result, the translation has to take account of the fact that the key code will be affected by any modifiers (shift, ctrl) that are held down at the same time, and the status of sateful keys like 'caps lock' and 'num lock'.

Translation of scan codes to key codes is not a one-to-one mapping. Some scan codes will generate multiple key codes, and some key codes require multiple scan codes. As discussed, 'cursor down' will become ctrl-open-bracket-B. But, unless the caps lock key is down, the letter 'A' requires four scan codes -- shift down, 'a' down, 'a' up, shift up.

Some of this complexity is necessary, just because of the way keyboards work. Some, however, exists because there is so strong a requirement for backward compatibility. Almost all Linux application will expect cursor keys to generate ancient VT52 escape sequences, although there isn't the slightest reason why they should -- this assumption is so entrenched in Linux that it would be almost impossible to dislodge now.

The keyboard translation table

Be that as it may, the kernel maintains a rather complex keyboard translation table to deal with the vagaries of scan code to key code translation. The table is manipulated using the loadkeys utility. This utility takes a filename as an argument, but usually the utility will search a set of well-defined directories to find the specified file -- it isn't usually necessary to give a full pathname, but it's certainly possible to.

In modern Linux installations, the keyboard translation tables will be compressed in GZIP format. However, when uncompressed, they reveal ordinary text files. loadkeys will work with both compressed and uncompressed keyboard mapping files. Here are the first few lines from the standard US keyboard layout file, us.kmap.gzip. The location of this file varies with the Linux distribution. I've seen /usr/lib/kbd/keymaps/legacy/i386/qwerty/ and /usr/share/keymaps/legacy/i386/qwerty/, and no doubt there are other possibilities.

# us.map
keymaps 0-2,4-6,8-9,12
alt_is_meta
include "qwerty-layout"
include "linux-with-alt-and-altgr"
strings as usual

keycode   1 = Escape
...

Note that this file is not a complete key map -- it uses include to load more general maps. Note also -- and this is the most important part -- that it uses PS/2-style scan codes, not USB scan codes. This standardization on PS/2 codes means that a single translation table can be used for different keyboard types even though, in practice, they are all USB these days.

In the keyboard mapping file, the 'Esc' key is mapped to a token Escape, which eventually ends up as the ASCII code 27. So far as I know, you can't put actual numeric ASCII codes in key map files -- we have to use the tokens that are defined.

To get a list of these constants, use `dumpkeys --long-info`. Irritatingly, this command has to be run on a console, even though we just want a list of recognized tokens, and this does not depend on the current key mappings.

Fortunately, the mapping tokens are reasonably self-explanatory. For example, the key code corresponding to the key combination Ctrl+N is Control_n. I wouldn't have guessed this, but it's easy to spot in the list.

Finding scan codes

To create or edit a keyboard mapping file, we need to know the (PS/2-style) scan codes of the keys. These are documented, but it's easier just to run showkey while poking the keys. This utility displays the PS/2-style scan codes of keys as they are pressed.

Here are the scan codes of the non-standards keys on a Media Centre remote control. So far as I can tell, ordinary keyboards with these extended keys generate the same codes.

mute           113
volume down    114
volume up      115
power off      116
mail           155
next track     163
play/pause     164
previous track 165
stop           166
www            172

The other keys on the Media Centre remote control all generate recognizable key codes, so might not need any further processing. For example, the fast-forward and rewind keys generate 'cursor right' and 'cursor left'.

It's also interesting to note that the A, B, C, and D keys on the remote control generate key codes for F1 - F4. On a Linux console, these function keys switch virtual terminals. These keys can be remapped as the others can, but it might be useful in some applications to leave them with their default functions.

Putting it all together

So how do we use this information to make it possible for a console application to use the non-conventional (media) keys? Simple: we need to change the keyboard mapping table so that these keys generate actual key codes. What key codes? That's entirely up to the application. If the application doesn't use any other keyboard, then the choice of mappings is completely arbitrary. We could map the remote control scan codes to letters, for example, and then code the application to expect these letters.

On the other hand, if your application might use a real keyboard as well, you need to pick key codes with that in mind. In my application, for example, I use the '+' and '-' keys to control volume. These keys make some kind of sense on a regular keyboard, and can be mapped to the volume up/volume down keys on the remote control. I use ctrl+P and control+N to play the previous and next items; I could just use 'P', and 'N', but the application can actually accept letters as input, so the raw 'P' and 'N' would clash. To stop playback I use ctrl+X. 'Pause' is just the space bar.

Of course, these are just my application choices. Since I'm writing the application, and setting the keyboard translation, I have complete freedom in this area. However, it makes sense to use mappings that would make some kind of sense on a regular keyboard, even if only for testing the application.

So, having decided the mappings to use, we need to set these mappings in the kernel. The way I do this is to edit one of the existing keyboard map files. Since the 'real' keyboards I use generally have US layout, I'm basing my custom keyboard map on the stock US layout.

To make the changes I uncompress the US layout file, us.kmap.gzip and copy it to file with a name of my choice. Then I edit this file, to add my custom mappings. For the settings I described above, I'm adding these lines:

keycode  115 = plus
keycode  114 = minus
keycode  164 = space
keycode  163 = Control_n
keycode  165 = Control_p
keycode  166 = Control_x

Again, the tokens plus, minus, etc., are found by inspecting the output of dumpkeys --long-info.

To test this mapping, we can just load the edited file using loadkeys. Poke the keys whilst running cat -v to check that recognizable codes are produced. The final step is to ensure that the key map is loaded at boot time. How to do this depends on the kind of Linux you're running; I just use one of the start-up scripts to run loadkeys.

Closing remarks

It turns out, in fact, that mapping the extended media keys onto real key codes is actually straightforward, when you know how: it's just a matter of editing a single text file, and processing it using loadkeys. The problem is that this process is largely undocumented. If you do a web search for keyboard mapping in Linux, all the results you get will be about keyboard mapping in X.

This is unhelpful here, because X uses a totally different process for keyboard mapping. None of the methods described for keyboard mapping with a graphical desktop are remotely useful for the console.