Raspberry Pi Pico: loading code into RAM and running it -- part 2
In the first article in this series I introduced the idea of having a program written using the Raspberry Pi Pico C SDK load additional code into the Pico's RAM, and execute it there. I suggested that there were few good reasons for doing this, apart from the sheer pleasure of doing something unnecessarily difficult. One good reason, however, might be to break up a large application into small modules, each of which can be loaded at runtime. The Pico only has 2Mb of flash ROM but, with a SD card, you can make gigabytes of binary code available as modules.
Of course, with only about 260kB of RAM available, these binary modules would have to be comparatively small -- but there could be an unlimited number of them.
Moreover, if you can load arbitrary code and run it, you have the beginnings of an operating system. Whether the world needs another operating system for ARM-based single-board computers is, of course, questionable; still, creating one is an interesting educational exercise, if nothing else.
In this article I will describe how to compile and link a C program,
using gcc
, in
such a way that it can be loaded into the Pico's RAM, and executed
there. There is a lot of complexity that will be needed in a real application,
which there isn't space to cover here. However, I will try to give at
least an outline of the process.
Note:gcc
will produce a binary in ELF format, which is non-trivial to process. Part of the job of running a program produced usinggcc
will be processing the ELF file on the Pico. I will later describe how to set up the linker to make this job tractable.
Recap
First, let's recap the conclusions from the first article. The Pico RAM occupies memory addresses 0x2000000 to 0x20040000. Above this are (conventionally) the stacks for the two CPUs. The memory heap for the Pico SDK program, which will be running from flash, occupies some memory at 0x20000000 -- the exact mount has to be worked out by testing. I suggested in the previous article that we would load the code to execute at address 0x20004000, leaving 16kB for the work area for the program in flash.
We'll also have to set aside some RAM for some software interrupt vectors, by means of which the program loaded in RAM can call into the program in flash -- unless, of course, the program modules that are loaded into RAM are completely self-contained. In practice, it seems reasonable to call into flash to get access to hardware, just as we would do with a full-scale operating system. After all, the relevant code will exist in the program in flash already, else we would not have a way to get a program into RAM in the first place. I won't be covering that subject in this article -- here I'm assuming that we're loading a program that is completely self-contained.
Compiling for the Pico
If you're used to working with the Pico C SDK, then you already have
tools that are capable of compiling C code for the Pico. The
arm-none-eabi-gcc
compiler can do this, with the right
switches. The essential compiler switches are -mthumb
, to
generate ARM thumb code, and -mcpu=cortex-m0plus
.
In most cases, gcc
can be used as a linker as well as a compiler.
I confess that I have not yet found a way to make this work for this
particular application. I compile using gcc
as usual,
and then link using arm-none-eabi-ld
, specifying exactly what
libraries (if any) are required. I'll have more to say about this subject
later.
In any case, if we aren't using gcc
as the linker, it will
usually be necessary to run the compiler with the additional switches
-nostdlib -nostartfiles -ffreestanding
to avoid the compiler
trying to tell the linker to include libraries and start files that
won't work in this application.
So a basic compilation command will look something like this, given that
the C code is in main.c
.
$ arm-none-eabi-gcc -mcpu=cortex-m0plus -mthumb \ -nostdlib -nostartfiles -ffreestanding \ -o main.o -c main.c
This will give us main.o
, containing ARM thumb code ready
to be linked. Note that this code is position-independent, that is,
it won't refer to specific memory locations. It is the linker's job to
resolve the specific addresses.
Start-up modules -- a digression
I'll have more to say about this subject later; for now, it's worth
bearing in mind that running your program from the first
byte of the compiled main.c
probably won't give
great results. If we take no other steps, that's what's going
to happen -- the linker will stuff the compiled code into the
binary output file, in pretty much the order it appears in
main.c
. It's conventional to start a C program in a function
called main
. You could just put your main
function first in main.c
, and ensure that main.o
is the first object file seen by the linker. That would actually work, so
long as you don't need to initialize anything before main
runs.
In any non-trivial appliation, there will be some pre-initialization to
do. If your program relies on a standard C library, there will
definitely
be some pre-initialization to do.
So we usually put the start-up code in a separate module, which does
the initialization and then calls main
. We need to ensure
that the linker knows to put this code at the place where execution
will start. We also need to know exactly where to find it in the ELF
file generated by the linker, so the Pico program can load it into
memory in the write order. I tackle this latter job simply by having
the linker write a monolithic, compact ELF file -- more on this
later.
Linking for the Pico
This is where the real fun begins.
The first point to note is that we have to link the program in such a
way that it can be loaded at a specific memory address --
0x20004000 is the address I use. That's easily done using the
linker switch -Ttext=0x20004000
, like this:
$ arm-none-eabi-ld -Ttext20004000 {startfile} {libraries} main.o -o myprog.elf
Note the startfile
-- whatever that turns out to be --
is the first object file supplied to the linker. This will be loaded into
memory first on the Pico.
This will produce a binary myprog.elf
in ELF format, with
addresses resolved for a program in memory starting at the specified address.
But if we set only the start of the text segment, as above, the linker will use its defaults to construct the binary ELF file. That's fine if (a) it doesn't matter how large the file is, and (b) we will implement a sophisticated program loader for the Pico.
In practice, we probably do care how large the ELF file is, especially if we're processing it using Pico code. The bigger problem, though, is the program layout in the ELF file.
The GCC linker assumes that the ELF file will be loaded by a sophisticated loader, probably on Linux. The compiled program will be made up of many -- perhaps thousands -- of segments, each with its own physical and virtual memory addresses. The program loaded will need to position these segments in RAM in a way that makes sense of the addresses.
This is too big a job on the Pico. What we really want is to create an ELF file that can be loaded as one, simple chunk directly into the Pico's RAM. This can partly be achieved using a linker script to combine all the initialized data -- code and variables -- into one great big segment, which can be loaded contiguously into RAM at the starting address.
I should point out that this approach works on the Pico because it has no memory protection. On Linux, for example, you'd have to separate out your code and data, because the code will be loaded into a read-only memory region. On the Pico, we won't worry about this.
For a simple configuration, the following linker script should be sufficient:
SECTIONS { .text : ALIGN(0x04) { *.o(.text .rodata .data) } .bss : { __bss_start__ = .; *(.bss*) *(COMMON) __bss_end__ = .; } }
To use the linker script we will, again, using the -T
switch to the ld
utility.
What this linker script does is to group all of the segments that
contain initalized code and data into one huge
text
segment, which
will we give the start address 0x20004000
.
We also have to handle uninitialized data. This is data
-- usually variables -- that have not been assigned specific
values by the programmer. Of course, they still have locations in
memory. The compiler will not generate code that initializes this
memory, because it is not the job of a specific code module. You
may remember that I referred to pre-initialization earlier? Well,
this is one of the pre-initialization tasks you'll need to do:
setting the uninitialized data area to zeros. The linker script
assigns variables __bss_start__
and __bss_end__
to the start and end of this memory area; your initialization code
will use these variables to determine the area of memory to be
initialized.
Stripping the linker output
Unfortunately, packing all the code and data into one segment is not
sufficient to give an ELF file that is of suitable size for use on
the Pico. The reason is that the GCC compiler and linker will generate
masses of debug-related data. Some of this can be removed using the
arm-none-eabi-strip
utility, but most can not. We can,
instead, use objcopy
to remove all the unnecessary
segments, like this:
$ arm-none-eabi-objcopy --remove-section=.comment \ --remove-section=.note --remove-section=.debug* myprog.elf myprog
This stripped program will be of a much more manageable size. As I hinted before, we wouldn't worry about this on a full-scale computer because, not only will it have much more memory than a Pico, it will have a sophisticated program loader that can inspect the ELF file to work out which parts of it need to be loaded. What we're really doing, by bending the linker to our will, is to make life easier for the Pico programmer, who will only have to implement a dumb program loader.
Loading on the Pico
With the ELF file constructed as I described above, loading the executable into the Pico's RAM will be easy. The Pico program needs to do some basic sanity checks on the ELF file -- we don't want to try to load a text file, for example, and execute it. However, we can assume -- knowing how the GCC linker works -- that the program's one and only segment will start at offset 0x4000 in the ELF file, and will extend to the end of the file (because we've stripped everything else out).
So all we have to do on the Pico is to read all but the first 16kB (0x4000) of the ELF file into memory, starting at address 0x20004000. I'm not going to discuss this part of the process in any more detail, because it depends entirely on where the binary data has been stored (SD card, probably). Once the ELF file has been loaded into memory, we just execute a funtion call to address 0x20004000 and away we go.
If only it were really that simple...
Standard libraries
The GCC compiler does not, in fact, generate freestanding code. For example, it won't generate ARM code to do arithmetic operations beyond what the CPU provides. I've fudged around this subject so far but, in fact, if you did try to compile and link any non-trivial program using the methods I've described so far, you'd have been rewarded with a whole slew of linker errors, related to functions with names like "__eabi_imul". These are the function calls that the compiler generates for operations that the ARM CPU can't do on its own.
Where are the implementations of these functions? Unless you want to
provide your own -- and this would be no mean feat -- they are in the
libgcc
library provided by the GCC compiler. Actually, the
ARM GCC provides a whole stack of these libraries, one for each supported
ARM architecture.
For the record, the libgcc
required for Pico applications is
thumb/v6-m/nofp/libgcc.a
. Where this file is depends on which
version of GCC you have, and how it's installed. On my fedora system, it
is at /usr/lib/gcc/arm-none-eabi/12.2.0/thumb/v6-m/nofp/libgcc.a
.
Using this library is as easy as adding it to the ld
command:
$ arm-none-eabi-ld -Ttext20004000 {startfile} /path/to/libgcc.a main.o -o myprog.elf
But...
libgcc
is not a standard C library. It won't provide
implementations of widely-used functions like printf
or
strcpy
, or the hundreds of othe standard functions that
C programmers are accustomed to use.
That may not be a problem for very simple applications but, for anything non-trivial, having to implement everything from scratch is a bear.
For most ARM projects, the standard C library is provided by Newlib. There are other possibilities: Android uses its own C library called Bionic, and MUSL is well-established. However, Newlib has the advantage of being included with GCC.
If you use a standard C library, your own program will need to provide
"stubs" for the platform specific functions that the library can't
implement. For example, Newlib provides robust implementations of
malloc
and related functions, but it doesn't know anything
about the layout of memory on the host system. You'll need to
provide your own implemntation of the sbrk
system call,
which can change and report the size of the available data space.
Newlib will use that function to control the way malloc
behaves.
You'll also need to initialize the memory allocator and, perhaps, other parts of the library. There's no easy way to know how to do this on a particular system -- you can get a clue by looking at how this initialization is done on other systems. In any case, Newlib comes in a "nano" variant which is particularly suitable for use on the Pico. Sadly, the initialization required for the full and "nano" version is (sigh) completely different.
Although any non-trivial application module will almost certainly need a standard C library, integrating Newlib with a Pico application is too big a topic to include in this article. I might write it up separately, if there is any interest.
Summary
To load and run binary code from a Pico program, we can use the same compiler and linker tools that the Pico SDK itself uses. Compiling and linking is a bit fussy, particularly if you want to make the operation of loading the code on the Pico straightforward.