You be the linker -- building "Hello, world" from scratch, in hexadecimal

gears

In this article I'm going to describe how to do something that is quite difficult, produces a trivial result, and has no practical benefit whatsoever. I'm going to explain how to produce a "Hello, world" program for Linux, from scratch, in hexadecimal. This is about the most low-level method of programming there is, unless we throw away the operating system and work on bare metal.

Why would anybody want to do such a thing? Well, there is no better way to understand how a program is structured, and how the operating system loads and runs it, than to build one in this most fundamental way.

To make life a bit easier, I'm going to focus on only one architecture: 64-bit Linux on AMD64 (x64) architecture. This is a fairly popular architecture and, in fact, there's little conceptual difference in the way any modern Linux system works. Of course, different CPUs will have different registers and different instruction codes, and I'm not even going to begin to discuss that. In addition, I'm not going to get into how to accommodate the different kinds of endianness, different segment alignments, etc., etc -- the job is difficult enough even for a single architecture.

To follow this exercise you don't need a compiler, or a linker, or even a standard library. Yes, that's right -- the program's output will be straight to the kernel. I will present and describe in detail all the "source code", which is files full of hexadecimal numbers. You should be able to cut-and-paste this into a file and convert it to binary using the xxd utility but, to save time, all the necessary files are on github.

Inputs and outputs

The input to the process will be a file, or many files, full of hexadecimal numbers in ASCII format. I'll include comments in the example but, of course, the comments won't contribute to the output. The output will be an executable program file in ELF format.

For simplicity, I've tried to implement the simplest program I could think of -- it just writes "Hello, world" to standard out -- and create the smallest, well-formed ELF file I could come up with. In fact, Linux will load defective ELF files, so long as the program code is correct and the header specifies the correct memory mapping parameters. All the other stuff is used by tools. So we should be able, for example, to disassemble a valid ELF file using objdump. The kernel is, in fact, more tolerant of a broken ELF file than most tools are, but I wanted to create one that was actually complete, if minimal.

Structure of the minimal ELF executable

Our ELF executable needs the following components:

An ELF header. This is always the first thing in the file and, in a 64-bit binary, is always exactly 64 bits long. The ELF header is the only part of the ELF file with a fixed location -- the location of all the other parts of the file are set out in the ELF header.
One program header (more are allowed). This header indicates, among other things, how much of what part of the file needs to be loaded into memory.
A section for the compiled program code. Again, more are allowed, but we need only one. This section will be given the name .text
A string table section. This is not for program data, but to hold the names of the sections, including itself. Our string table will contain two entries: .text for the program code, and .shstrtab for itself.
Three section headers of 64 bytes in size. The first will be all zeros. This null header exists to allow for applications with huge numbers of sections, and is not relevant here. The section section header will define the location and size in the file of the .text section. The third defines the location and size of the string table.

It's important to understand that the only part of the ELF file that is in a fixed location is the program header. All other parts can appear anywhere, in any order. The layout I have chosen is reasonably typical, but by no means universal (see figure). To build the file we will need to know the position (offset in the file) of every component. Let's see how to work the sizes out.

Building the executable file components in hexadecimal

In this section I describe how to write the hex codes for the various components of the executable file I outlined above. I will make frequent references to the offsets in the figure, and it will be difficult to follow the descriptions without having it handy.

Calculating the sizes and offsets of the sections

Easy ones first: the ELF header, program header, and section headers are all of fixed size. I've put the program header directly after the ELF header, so it's offset is 0x40. I'm using hexadecimal numbers here, because these are what we need to poke into the file.

The compiled (that is, compiled by me) program code is 42 bytes long (keep reading to see why), and follows the program header, so it's at offset 0x78. The text "Hello, world" immediately follows the program code, so it's offset is 0xA2. It has a line feed and a terminating null (which isn't actually required here, but once a C programmer, always a C programmer). So it's size is 14 bytes. This puts the start of the string table, immediately after, at 0xB0.

The stringtable has two strings: .shstrtab and .text. Both are terminated by zeros, so the total size of this section is 9 + 1 + 5 + 1 = 16 bytes. The offset into the table of the second string, .text is 10 bytes -- we'll need this information when setting up its section header.

Adding 16 to the offset of the start of the string table gives 0xC0 for the first section header. We don't need to know the offsets of the other section headers, because they are all the same size, and we will specify how many there are in the ELF header. Putting all these sizes together gives a total size of the ELF file of 384 bytes.

There's one more thing we need to know -- the amount of the file to load into memory. We need to load up to the start of the string table -- nothing else is needed at runtime. So we need to load 64 + 56 + 42 + 14 + 16 = 192 = 0xC0 bytes, starting from the beginning of the file.

A note on memory mapping

To write real applications (or a compiler for them) we need to know the memory mapping model in detail, that applies to our target architecture. For the present example, however, all we need to know is that the AMD64 architecture assumes that executable code will start at virtual address 0x400000 (4 MiB). There are a combination of technical and historical reasons for this. Essentially this address is one memory page up from zero, for the largest page size in common use.

We don't want anything useful in memory starting at 0x0 -- we want a null pointer genuinely not to point to anything.

Now, although the AMD64 architecture notionally has a 48-bit virtual address space, most instructions take only 32-bit direct operands (as we'll see in the program code later). Special techniques are needed to address memory beyond the 32-bit offset range. So we want to put the program code in the lowest virtual memory page as we can, other than the zero page.

Whatever the reason, we need to set the program header to load the relevant part of the binary file into memory starting at 0x400000. Because this will include the ELF header and program header, the program code that starts at offset 0x78 in the file will end up at 0x400078 in memory.

0x400078 is such a common starting address that the GNU linker ld assumes it by default, if you don't specify a symbol called _start. Of course, we aren't using a linker here -- we are the linker; so we have to specify this address correctly.

Building the ELF header

We now have all the information we need to populate the ELF header: we know what all the relevant offsets are, and we know where to load the binary data, and how much. We know that there will be three section headers, and that entry 2 contains the string table.

The ELF header is well documented, so I don't propose to describe what every entry does. I've put comments in the hex data below to explain the various settings but, of course, the comments will never form part of the executable file -- we'll have to strip them out.

# ELF header -- always 64 bytes

# Magic number
7f 45 4c 46 

# 02 = 64 bit
02

# 01 = little-endian
01

# 01 = ELF version
01

# 00 = Target ABI -- usually left at zero for static executables
00

# 00 = Target ABI version -- usually left at zero for static executables
00

# 7 bytes undefined
00 00 00 00 00 00 00 

# 02 = executable binary
02 00 

# 3E = amd64 architecture
3e 00

# 1 = ELF version
01 00 00 00 

# 0x400078 = start address: right after this header and the program
#  header, which take 0x78 bytes, if the binary is mapped into 
#  memory at address 0x400000)
78 00 40 00 00 00 00 00

# 0x40 = offset to program header, right after this header which is 0x40 bytes long 
40 00 00 00 00 00 00 00

# 0xC0 = offset to section header, which is after the program text and the 
#  string table
c0 00 00 00 00 00 00 00 

# 0x00000000 = architecture specific flags
00 00 00 00

# 0x40 = size of this header, always 64 bytes
40 00

# 0x38 = size of a program header, always 56 bytes
38 00

# 1 = number of program header
01 00

# 0x40 = size of a section header, always 64 bytes
40 00

# 3 = number of sections headers 
03 00

# 2 = index of the section header that references the string table
02 00

Constructing the program header

We also have enough information to construct the program header. Note that it replicates some of the information already provided in the ELF header. Again, I've put comments to explain how the various settings are derived in this example, but I don't want to explain every one.

# Program header -- always 56 bytes

# 1 = type of entry: loadable segment
01 00 00 00 

# 0x05 = segment-dependent flags: executable | readable
05 00 00 00 

# 0x0 = Offset within file
00 00 00 00 00 00 00 00 

# 0x400000 = load position in virtual memory
00 00 40 00 00 00 00 00 

# 0x400000 = load position in physical memory
00 00 40 00 00 00 00 00

# 0xB0 = size of the loaded section the file
B0 00 00 00 00 00 00 00

# 0xB0 = size of the loaded section in memory 
B0 00 00 00 00 00 00 00

# 0x200000 = alignment boundary for sections
00 00 20 00 00 00 00 00

"Compiling" the program

The program will do two things:

Print "Hello, world" to standard out.
Exit.

With high-level languages we don't necessarily think of "exit" as a compulsory operation -- the program typically stops when it runs out of things to do. In machine language, though, we always need a specific exit operation.

Both the print and the exit are achieved by 'syscalls', that is, calls into the kernel. The AMD64 syscalls are not as well-documented as they should be -- except in the kernel source. For reference I usually look at this article by Ryan Chapman.

The way syscalls work is fairly straightforward -- load the syscall number into the %rax register, load the arguments into various other registers (all documented in the previous link), and then execute the syscall function.

For our application we need just two syscalls: sys_write (1) nd exit (60).

Writing out the code in GNU assembler language, we have:

    mov     $1, %rax     # sys_write
    mov     $1, %rdi     # file descriptor, 1=stdout
    mov     $hello, %rsi # buffer
    mov     $13, %rdx    # byte count
    syscall

    mov     $60, %rax    # exit
    xor     %rdi, %rdi   # return value, 0
    syscall

But wait -- all the values being put into registers are literals, except $hello. This is where we get to play "you be the linker". Lacking a linker we need to replace the value of $hello with the (32-bit) memory address of the string "Hello, world". You'll remember that I decided to put this string directly after the program code. The program code will turn out to be be 42 bytes long, and it starts at 0x40078. The location of the string is therefore 0x40078 + 82, or 0x4000A2. We'll need to insert this manually into the hexadecimal code at the appropriate position.

There are two ways to go about turning the assembly code into real, hexadecimal machine instructions. The first is to spend a great deal of time studying x86 reference material and other documents. The second is to cheat, and use a real assembler for this part. The choice is yours ;) If you grew up with 8-bit microprocessors as I did, you'll find that the enormous, convoluted instruction set of AMD64 devices is quite painful to work with. Of course, until in the 8-bit days, modern CPUs are not designed to be human-readable.

However you achieve the assembly process, the program code, followed by the text string, looks like this:

# Program code -- 42 bytes

# mov 0x01,%rax -- sys_write
48 c7 c0 01 00 00 00 

# mov 0x01,%rdi -- file descriptor, stdout
48 c7 c7 01 00 00 00

# mov 0x4000a2,%rsi -- location of string (see text above)
48 c7 c6 a2 00 40 00

# mov 0x0d,%rdx -- size of string, 13 bytes
48 c7 c2 0d 00 00 00

# syscall
0f 05

# mov 0x3c,$rax -- exit program
48 c7 c0 3c 00 00 00 

# xor %rdi,%rdi -- exit code, 0
48 31 ff 

# syscall
0f 05 

# Text "Hello, world\n\0" -- total 14 bytes including the null
48 65 6c 6c 6f 2c 20 77 6f 72 6c 64 0a 00

Constructing the string table

This, at least, is dead easy. It's just the ASCII values for the two strings .shstrtab and .text, null terminated, one after the other. Here it is:

# String table. ".shstrab\0.text\0" -- 16 bytes
2e 73 68 73 74 72 74 61 62 00 2e 74 65 78 74 00

Constructing the section header table

As I described above, there are three section headers, all 64-bits. The first one is easy -- it's all zeros. The next section header describes the .text section, and looks like this:

# Section header table, section 1 -- program text -- 64 bytes

# 0x0A = offset to the name of the section in the string table
0a 00 00 00

# 1 = type: program data
01 00 00 00

# 0x06 flags = executable | occupies memory
06 00 00 00 00 00 00 00

# 0x400078 address in virtual memory of this section
78 00 40 00 00 00 00 00

# 0x78 = offset in the file of this section (start of program code)
78 00 00 00 00 00 00 00

# 0x38 = size of this section in the file: 56 bytes
38 00 00 00 00 00 00 00

# sh_link -- not used for this section
00 00 00 00 00 00 00 00

# 0x01 = alignment code: default??
01 00 00 00 00 00 00 00

# sh_entsize: not used
00 00 00 00 00 00 00 00

Note that the section header describes where the section is located in the file, how large it is, and where it is to be loaded into memory.

Here's the section header for the string table:

# Section header table, section 2 -- string table

# 0x0 = offset to the name of the section in the string table
00 00 00 00

# 3 = type: string table
03 00 00 00

# 0x0 = flags
00 00 00 00 00 00 00 00

# 0x0 = address in virtual memory (not used)
00 00 00 00 00 00 00 00

# 0xB0 = offset in the file of this section (start of string table)
b0 00 00 00 00 00 00 00

# 0x10 = size of this section in the file: 16 bytes
10 00 00 00 00 00 00 00

# sh_link -- not used for this section
00 00 00 00 00 00 00 00

# 0x01 = alignment code: default??
01 00 00 00 00 00 00 00

# sh_entsize: not used
00 00 00 00 00 00 00 00

Note that there is no memory address in this section -- it will never be loaded into memory.

Putting it all together

That's the hard part done. At this stage, we have a number of files of somewhat incomprehensible hexadecimal text or, perhaps, one large, totally incomprehensible one.

We can use the standard xxd utility to convert these hex files to binary, but we'll need to strip the comments first. Here is the script I use to build the executable file from my hex files:

BINARY=elfdemo
cat 0_elf_header.hex \
  1_program_header.hex \
  2_program_code.hex \
  3_string_table.hex \
  4_section_header_0.hex \
  5_section_header_1.hex \
  6_section_header_2.hex \
  | grep -v "#" > $$.hex
xxd -p -r $$.hex $BINARY
chmod 755 $BINARY
rm $$.hex

This script just concatenates all the individual hex files into one large file, strips the comments, and passes the result to xxd to turn into binary. If all is well, the resulting elfdemo will be 384 bytes long, and if you run it, you'll see:

$ ./elfdemo
Hello, world