Embedding resources into executables built using GCC

C logo It's often useful to be able to include blocks of data, or whole files, into the compiled binary of a C or C++ program. Native Windows C/C++ compilers provided ways to do this, along with proprietary API calls to extract the data. A good example was the inclusion of language-specific textual elements, like menus and string tables. We can readily support different languages, simply by providing different text files. In Windows jargon these text files were called "resource files", and the tool that made the accessible to the linker was a "resource compiler".

The C standard library provides no mechanism for loading raw data from an executable file, or from its memory image. Compilers do not necessarily even provide a way to incorporate this kind of data.

GCC, however, does. That is, it provides a way to include raw files in the output of the linker, in such a way that they form part of the image of the executable in memory. It also provides symbols for the addresses of these data blocks, which can be used by programs.

That's all it does, however -- if you want to provide Windows-like binary resource management, you'll have to implement the code yourself.

In this article I describe the minimum you need to know to incorporate raw files into an executable, and locate their images in memory. I will use a string table as an example. This will be provided in plain text, in a file stringtable.txt. I'll show how to link this with the executable, and in outline how to read it in code. The method will accommodate any type of file: images, sound clips, XML files, HTML files, as the application requires.

The methods I'm describing here are non-portable. The "nice" way to include raw data is to use a tool that will convert it to a C source file that contains an enormous C array definition. The GCC method needs no additional tooling, and will work with files of any size

First, we need to include the raw data files in the linker output. If you're using the ld linker explicitly, you can do this:

$ ldd ... \
        --format=binary {raw files} \
        --format=default {object files} 

This tells the linker that the "raw" files are not to be interpreted as object files (with their own relocations, etc), but just written to the output unchanged.

If you're using gcc to link as well as to compile, you'll need to pass ld arguments to gcc, like this:

$ gcc ... \
        -Wl,--format=binary {raw files} \
        -Wl,--format=default {object files} 

For example, I want to include a string table in text format. All the data is in stringtable.txt. I would link it like this:

$ gcc ... \
        -Wl,--format=binary stringtable.txt \

That takes care of the linking, but what about access to the data in the program?

The key point here is that the raw data will be loaded into memory along with the rest of the program. The linker makes assembler symbols available, whose names indicate the beginning and end of the data block in memory. These symbols have names _binary_{filename}_start and _binary_{filename}_end. For the file stringtable.txt, for example, we can make the following declarations in the C/C++ source:

extern char stringtable_start[] asm("_binary_stringtable_txt_start");
extern char stringtable_end[] asm("_binary_stringtable_txt_end");

It's important to understand that these declarations are of arrays, and not pointers. It's not correct to say:

extern char *stringtable_start asm("_binary_stringtable_txt_start");

This will compile and run, but it will fail catastrophically. The reason is that the assembler symbols are the addresses of places in memory where the relevant data follows (or stops). They are not actually variables that can be accessed through a pointer. Unfortunately, the fact that arrays and pointers can be used interchangeably in C/C++ in many places obscures the subtle distinction between the two.

The size of the string table in memory is just:

size_t stringtable_size = stringtable_end - stringtable_start;

And that, really, is all there is too it. Once you know where the data is in memory, it's just a matter of writing the appropriate code to execute it. In my GitHub repository there is a more complete example of implementing a string table.

There are a couple of things to watch out for.

First, with the set-up described, the raw data in the files, once loaded into memory, is read/write for the program. That may, or may not, be what is required. If the application should not be modifying the data, you can reduce the likelihood of accidentally doing so by declaring it const.

extern const char stringtable_start[] asm("...");

second, the linker will not insert any specific terminating character. It would be convenient, with text data, to be able to treat it like an ordinary C array of char -- but care has to be taken because the data won't necessarily be null-terminated.

Of course, you could insert a terminating null into the text file, and then it will be read as any other byte. However, this could make the text hard to edit. This is a problem that is really specific to text -- most other file types have a length encoded, or have some way to work it out. If not, you can always rely on the end marker to calculate the length of the included file.