Raspberry Pi Pico: loading code into RAM and running it -- part 2

Chip logo

In the first article in this series I introduced the idea of having a program written using the Raspberry Pi Pico C SDK load additional code into the Pico's RAM, and execute it there. I suggested that there were few good reasons for doing this, apart from the sheer pleasure of doing something unnecessarily difficult. One good reason, however, might be to break up a large application into small modules, each of which can be loaded at runtime. The Pico only has 2Mb of flash ROM but, with a SD card, you can make gigabytes of binary code available as modules.

Of course, with only about 260kB of RAM available, these binary modules would have to be comparatively small -- but there could be an unlimited number of them.

Moreover, if you can load arbitrary code and run it, you have the beginnings of an operating system. Whether the world needs another operating system for ARM-based single-board computers is, of course, questionable; still, creating one is an interesting educational exercise, if nothing else.

In this article I will describe how to compile and link a C program, using gcc, in such a way that it can be loaded into the Pico's RAM, and executed there. There is a lot of complexity that will be needed in a real application, which there isn't space to cover here. However, I will try to give at least an outline of the process.

Note:
gcc will produce a binary in ELF format, which is non-trivial to process. Part of the job of running a program produced using gcc will be processing the ELF file on the Pico. I will later describe how to set up the linker to make this job tractable.

Recap

First, let's recap the conclusions from the first article. The Pico RAM occupies memory addresses 0x2000000 to 0x20040000. Above this are (conventionally) the stacks for the two CPUs. The memory heap for the Pico SDK program, which will be running from flash, occupies some memory at 0x20000000 -- the exact mount has to be worked out by testing. I suggested in the previous article that we would load the code to execute at address 0x20004000, leaving 16kB for the work area for the program in flash.

We'll also have to set aside some RAM for some software interrupt vectors, by means of which the program loaded in RAM can call into the program in flash -- unless, of course, the program modules that are loaded into RAM are completely self-contained. In practice, it seems reasonable to call into flash to get access to hardware, just as we would do with a full-scale operating system. After all, the relevant code will exist in the program in flash already, else we would not have a way to get a program into RAM in the first place. I won't be covering that subject in this article -- here I'm assuming that we're loading a program that is completely self-contained.

Compiling for the Pico

If you're used to working with the Pico C SDK, then you already have tools that are capable of compiling C code for the Pico. The arm-none-eabi-gcc compiler can do this, with the right switches. The essential compiler switches are -mthumb, to generate ARM thumb code, and -mcpu=cortex-m0plus.

In most cases, gcc can be used as a linker as well as a compiler. I confess that I have not yet found a way to make this work for this particular application. I compile using gcc as usual, and then link using arm-none-eabi-ld, specifying exactly what libraries (if any) are required. I'll have more to say about this subject later.

In any case, if we aren't using gcc as the linker, it will usually be necessary to run the compiler with the additional switches -nostdlib -nostartfiles -ffreestanding to avoid the compiler trying to tell the linker to include libraries and start files that won't work in this application.

So a basic compilation command will look something like this, given that the C code is in main.c.

$ arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb \
     -nostdlib -nostartfiles -ffreestanding \
     -o main.o -c main.c

This will give us main.o, containing ARM thumb code ready to be linked. Note that this code is position-independent, that is, it won't refer to specific memory locations. It is the linker's job to resolve the specific addresses.

Start-up modules -- a digression

I'll have more to say about this subject later; for now, it's worth bearing in mind that running your program from the first byte of the compiled main.c probably won't give great results. If we take no other steps, that's what's going to happen -- the linker will stuff the compiled code into the binary output file, in pretty much the order it appears in main.c. It's conventional to start a C program in a function called main. You could just put your main function first in main.c, and ensure that main.o is the first object file seen by the linker. That would actually work, so long as you don't need to initialize anything before main runs.

In any non-trivial appliation, there will be some pre-initialization to do. If your program relies on a standard C library, there will definitely be some pre-initialization to do.

So we usually put the start-up code in a separate module, which does the initialization and then calls main. We need to ensure that the linker knows to put this code at the place where execution will start. We also need to know exactly where to find it in the ELF file generated by the linker, so the Pico program can load it into memory in the write order. I tackle this latter job simply by having the linker write a monolithic, compact ELF file -- more on this later.

Linking for the Pico

This is where the real fun begins.

The first point to note is that we have to link the program in such a way that it can be loaded at a specific memory address -- 0x20004000 is the address I use. That's easily done using the linker switch -Ttext=0x20004000, like this:

$ arm-none-eabi-ld -Ttext20004000 {startfile} {libraries} main.o -o myprog.elf

Note the startfile -- whatever that turns out to be -- is the first object file supplied to the linker. This will be loaded into memory first on the Pico.

This will produce a binary myprog.elf in ELF format, with addresses resolved for a program in memory starting at the specified address.

But if we set only the start of the text segment, as above, the linker will use its defaults to construct the binary ELF file. That's fine if (a) it doesn't matter how large the file is, and (b) we will implement a sophisticated program loader for the Pico.

In practice, we probably do care how large the ELF file is, especially if we're processing it using Pico code. The bigger problem, though, is the program layout in the ELF file.

The GCC linker assumes that the ELF file will be loaded by a sophisticated loader, probably on Linux. The compiled program will be made up of many -- perhaps thousands -- of segments, each with its own physical and virtual memory addresses. The program loaded will need to position these segments in RAM in a way that makes sense of the addresses.

This is too big a job on the Pico. What we really want is to create an ELF file that can be loaded as one, simple chunk directly into the Pico's RAM. This can partly be achieved using a linker script to combine all the initialized data -- code and variables -- into one great big segment, which can be loaded contiguously into RAM at the starting address.

I should point out that this approach works on the Pico because it has no memory protection. On Linux, for example, you'd have to separate out your code and data, because the code will be loaded into a read-only memory region. On the Pico, we won't worry about this.

For a simple configuration, the following linker script should be sufficient:

SECTIONS
{
.text : ALIGN(0x04) { *.o(.text .rodata .data) }
.bss :
  {
  __bss_start__ = .;
  *(.bss*)
  *(COMMON)
  __bss_end__ = .;
  }
}

To use the linker script we will, again, using the -T switch to the ld utility.

What this linker script does is to group all of the segments that contain initalized code and data into one huge text segment, which will we give the start address 0x20004000.

We also have to handle uninitialized data. This is data -- usually variables -- that have not been assigned specific values by the programmer. Of course, they still have locations in memory. The compiler will not generate code that initializes this memory, because it is not the job of a specific code module. You may remember that I referred to pre-initialization earlier? Well, this is one of the pre-initialization tasks you'll need to do: setting the uninitialized data area to zeros. The linker script assigns variables __bss_start__ and __bss_end__ to the start and end of this memory area; your initialization code will use these variables to determine the area of memory to be initialized.

Stripping the linker output

Unfortunately, packing all the code and data into one segment is not sufficient to give an ELF file that is of suitable size for use on the Pico. The reason is that the GCC compiler and linker will generate masses of debug-related data. Some of this can be removed using the arm-none-eabi-strip utility, but most can not. We can, instead, use objcopy to remove all the unnecessary segments, like this:

$ arm-none-eabi-objcopy --remove-section=.comment \
   --remove-section=.note --remove-section=.debug* myprog.elf myprog

This stripped program will be of a much more manageable size. As I hinted before, we wouldn't worry about this on a full-scale computer because, not only will it have much more memory than a Pico, it will have a sophisticated program loader that can inspect the ELF file to work out which parts of it need to be loaded. What we're really doing, by bending the linker to our will, is to make life easier for the Pico programmer, who will only have to implement a dumb program loader.

Loading on the Pico

With the ELF file constructed as I described above, loading the executable into the Pico's RAM will be easy. The Pico program needs to do some basic sanity checks on the ELF file -- we don't want to try to load a text file, for example, and execute it. However, we can assume -- knowing how the GCC linker works -- that the program's one and only segment will start at offset 0x4000 in the ELF file, and will extend to the end of the file (because we've stripped everything else out).

So all we have to do on the Pico is to read all but the first 16kB (0x4000) of the ELF file into memory, starting at address 0x20004000. I'm not going to discuss this part of the process in any more detail, because it depends entirely on where the binary data has been stored (SD card, probably). Once the ELF file has been loaded into memory, we just execute a funtion call to address 0x20004000 and away we go.

If only it were really that simple...

Standard libraries

The GCC compiler does not, in fact, generate freestanding code. For example, it won't generate ARM code to do arithmetic operations beyond what the CPU provides. I've fudged around this subject so far but, in fact, if you did try to compile and link any non-trivial program using the methods I've described so far, you'd have been rewarded with a whole slew of linker errors, related to functions with names like "__eabi_imul". These are the function calls that the compiler generates for operations that the ARM CPU can't do on its own.

Where are the implementations of these functions? Unless you want to provide your own -- and this would be no mean feat -- they are in the libgcc library provided by the GCC compiler. Actually, the ARM GCC provides a whole stack of these libraries, one for each supported ARM architecture.

For the record, the libgcc required for Pico applications is thumb/v6-m/nofp/libgcc.a. Where this file is depends on which version of GCC you have, and how it's installed. On my fedora system, it is at /usr/lib/gcc/arm-none-eabi/12.2.0/thumb/v6-m/nofp/libgcc.a.

Using this library is as easy as adding it to the ld command:

$ arm-none-eabi-ld -Ttext20004000 {startfile} /path/to/libgcc.a main.o -o myprog.elf

But...

libgcc is not a standard C library. It won't provide implementations of widely-used functions like printf or strcpy, or the hundreds of othe standard functions that C programmers are accustomed to use.

That may not be a problem for very simple applications but, for anything non-trivial, having to implement everything from scratch is a bear.

For most ARM projects, the standard C library is provided by Newlib. There are other possibilities: Android uses its own C library called Bionic, and MUSL is well-established. However, Newlib has the advantage of being included with GCC.

If you use a standard C library, your own program will need to provide "stubs" for the platform specific functions that the library can't implement. For example, Newlib provides robust implementations of malloc and related functions, but it doesn't know anything about the layout of memory on the host system. You'll need to provide your own implemntation of the sbrk system call, which can change and report the size of the available data space. Newlib will use that function to control the way malloc behaves.

You'll also need to initialize the memory allocator and, perhaps, other parts of the library. There's no easy way to know how to do this on a particular system -- you can get a clue by looking at how this initialization is done on other systems. In any case, Newlib comes in a "nano" variant which is particularly suitable for use on the Pico. Sadly, the initialization required for the full and "nano" version is (sigh) completely different.

Although any non-trivial application module will almost certainly need a standard C library, integrating Newlib with a Pico application is too big a topic to include in this article. I might write it up separately, if there is any interest.

Summary

To load and run binary code from a Pico program, we can use the same compiler and linker tools that the Pico SDK itself uses. Compiling and linking is a bit fussy, particularly if you want to make the operation of loading the code on the Pico straightforward.