ARM assembly-language programming for the Raspberry Pi
13. Elementary number formatting
In this example and the two that follow I will demonstrate how to format a register, whose contents are (signed) binary, for display as a base-10 number. If you're used to programming in high-level languages, you'll probably not have given much thought to low-level display operations like this -- the compiler or some library just takes care of it. Of course, we can use libraries in assembly programs, too, but doing things like this manually is more educational.
Our basic approach to formatting a binary will be repeated division by ten, printing the remainder at each stage. If you think about it, you'll appreciate that this will give the result in the wrong order -- the least-significant digit will be printed first. In a later article I'll explain how we can correct that.
This example further illustrates the technique of using the stack for local data storage, that I introduced in an earlier article.
The example
The listing below only shows those parts of the program that are new;
the print_str
and strlen
functions are
the same as in previous examples.
/* =========================== putchar =====================================*/ // Write to standard output the character in %r0. The sys_write function // requires an address in memory, not a register value. So we need somehow // to store the %r0 value locally before it can be displayed. putchar: PUTCHAR_LOCAL = 8 // How much data to reserve on the stack - 8 bytes push {r0, r1, fp, lr} // Store the registers we will overwrite sub sp, $PUTCHAR_LOCAL // Move the stack _down_ to allow for our data mov %fp, %sp // %fp will reference the start of our 8-byte area mov %r1, %fp // Use %r1 to count the position we are writing strb %r0,[%r1] // Set the character to memory in the stack add %r1, $1 // And increment the offset by one byte mov %r0, $0 // Store the terminating null character strb %r0,[%r1] // And write it out mov %r0, %fp // print_str needs an address in %r0 bl print_str // Print the string add sp, $PUTCHAR_LOCAL // Move the stack pointer over our data area pop {r0, r1, fp, lr} // and restore the registers bx lr /* =========================== itoa1 ========================================*/ // Outputs the integer as ASCII digits in reverse order, because that's // easiest at this stage. itoa1: push {r0-r2, lr} mov %r2, %r0 // Keep the running total in %r2 mov %r1, $10 // Keep the const 10 in r1 itoa_loop: mov %r0, %r2 bl mod // Divide running total by 10; remainder left in %r0 add %r0, $48 // Add '0' to make the number into an ASCII digit bl putchar // Print the digit sdiv %r2, %r1 // Divide the running total by 10 cmp %r2, $0 // If there's anything left, repeat the division bne itoa_loop pop {r0-r2, lr} bx lr /* =========================== start ========================================*/ _start: mov %r0, $12345 // Number to be converted goes in %r0 bl itoa1 // Convert it ldr %r0, =EOL // Print a newline, to make the output clearer bl print_str // exit mov %r0, $0 b exit
The putchar
function outputs a single character whose
ASCII value is in the r0
register. It does this by
forming a null-terminated string in local memory in the stack, and
then calling print_str
on that string. The string will
always be exactly two bytes long -- one for the character, and one
for the null; but the stack has to be aligned correctly, as we've
discussed before.
If you look carefully at this function, you should see an area of
inefficiency that can easily be improved. The function print_str
is designed to take a null-terminated string, and calculate its length
usingstrlen
. But we already know the length -- it's always
exactly one byte, because putchar
is designed to print a
byte.
In my example I've implemented the putchar
method in terms
of print_str
-- but wouldn't it be more efficient the
other way around? Wouldn't it be better to implement print_str
in terms of repeated calls on putchar
? If we did this, we
could have an efficient putchar
that always printed one
byte.
The answer is, in fact, no -- this wouldn't be more efficient. The reason is that syscalls are slow. Implementing a syscall for every character would be hugely inefficient. Printing "Hello, World" would require twelve syscalls, instead of one.
Of course, outputting our decimal digits one-by-one is also inefficient, but I'll fix that in a later example.
We could still improve efficiency by having putchar
call
sys_write
directly, passing a single byte, rather than having
it call print_str
as my example does. The problem with
this approach is that we'b be splitting low-level operations across multiple functions.
Suppose, for example, that we wanted to implement a scheme to buffer
characters, so that sys_write
only got called for a complete
line of text, or when the buffer was full. In production code you'd
almost certainly want to do this. If all the low-level output is in
a single function, you'd only have to change this one function to
change the entire program to use buffered output.
In the end, there are trade-offs to be made between comprehensibility, maintainability, and efficiency in assembly language, just as there are in any form of programming. However, the very reason for using assembly language -- speed and efficient storage -- argues against creating comprehensible and maintainable code. Balancing competing requirements is particularly difficult in assembly programming.
Number conversion
The textbook way to convert a number from one base to another is to divide repeatedly by the base. In this case, we want to convert the binary number whose decimal value is 12345 to a sequence of decimal digits. It isn't sufficient just to get the decimal value for each digit, we also have to get the ASCII code for that digit.
The first step is to divide 12345 mod 10, that is, to get the remainder on dividing by 10. This yields the smallest digit, 5. ASCII codes for digits start at (decimal) 48 for the character '0', so we add 48 to the remainder and print it. We then divide the whole number by 10, which shifts the decimal digits down, giving 1234. We repeat this process until we've divided the number down to zero. The output of the program is "54321" which is, of course, reversed.
A useful feature of this method is that it works for any number base. For example, we could convert to hexadecimal, rather than decimal, by dividing by 16. However, we can't just convert a hexadecimal digit to an ASCII code by adding 48 as we did for decimal -- the ASCII codes for letters don't follow on directly from the digits. However, it isn't particularly difficult to write a function that outputs a hexadecimal digit. Modifying the code to output the result in octal (base 8), for example, is trivially easy.
Signed and unsigned conversion
Theitoa1
function carries out an unsigned base
conversion. It will give the correct response if the argument in
the r0
register is being treated as unsigned. In practice,
if we're working in decimal, we probably want to be able to handle
signed numbers. I'll come back to this point in a later example.
Summary
Working in assembler often means implementing functionality that we take completely for granted with high-level languages.
There's a standard method for base conversion by repeated division, which works with any number base. However, it relies on modulo division, which we have to implement ourselves.
As a program gets more complicated, the more it becomes apparent that design trade-offs have to be made. Ironically, assembly language is often chosen for reasons of efficiency, but the most efficient code is often the least maintainable. This is true in all programming languages, but it's particularly acute with assembly language.
- Previous: 12. Basic arithmetic operations
- Table of contents
- Next: 14. More complicated string processing