Why 'int x = 0' is uninitialized data to the GNU C compiler

C logo

This article is about a curiosity of gcc that will interest almost nobody. I came across it whilst implementing a program loader that could load binary code into RAM on a Raspberry Pi Pico. I suspect that this is the only application where the initialization behaviour of the C compiler is of any significance whatsoever, Still, I thought it worth writing up because, apart from anything else, it's such bizarre behaviour that I can imagine myself being caught by it again later, when my own intracranial RAM needs to be refreshed.

I noticed the problem when different (global) variables in my programs, although of the same type, were being treated differently at runtime. For example, I had variables defined like this:

int max_columns=80;
int starting_column=0;

I couldn't work out why max_columns seemed to take the correct value, 80, but starting_colums had a crazy value.

Now, I'm writing the program loader myself so, of course, I could well understand why all the global variables would end up with crazy values. It would just be a bug in my loader. I could even understand why variables of a particular type or storage class might have ended up with crazy values. However, after much experimentation I realized that the crazy values were all assigned to variables which were initialized to zero in my code. It turned out I could set any value except zero.

Let's look at a simple example to see why this happens. Although I'm working on an embedded device, the problem can be seen quite easily using the ordinary Linux gcc.

Compile and link this trivial program:

int aa1 = 42;
int aa2 = 6;
int aa3 = 0;
int aa4;
int main() { return 0; }
$ gcc -o test test.c

I've chosen these "aa" names just to make it easy to find the variables in the ELF file created by gcc:

$ objdump --all test | grep aa 

The output, on my system, is:

000000000040401c g     O .data	0000000000000004              aa1
0000000000404020 g     O .data	0000000000000004              aa2
0000000000404028 g     O .bss	0000000000000004              aa3
000000000040402c g     O .bss	0000000000000004              aa4

Notice that aa1 and aa2 -- variables that are assigned non-zero values in my program -- are in .data segments. But aa3, which is assigned the value 0, and aa4, which is not assigned a value at all, are in .bss segments.

The problem was that my program loader was not initializing the BSS segments correctly. So variables whose values were stored in those segments were not getting zeroed. Since aa3 specifically had to be zero, not zeroing it was a significant error.

But why are different variables of the same type in different segments? What's so special about the value zero? The fact is that I don't really know. I presume that this is some kind of optimization carried out by gcc, in an attempt to be save a few bytes somewhere.

Conventionally, 'data' segments are used to store values that have been initialized specifically by the programmer. It matters what the values are, and they are stored in the executable file generated by the compiler.

'BSS' segments, however, are traditionally used to "store" variables that have not explicitly been given an initial value. "Store" isn't really the right word here: the linker does not have to allocate any space in the executable file for values of these variables -- they are just placeholders.

Although no values are stored in the executable, values still have to be set in memory. The C language standards, however, stipulate that uninitialized global variables take the value zero at runtime. So somebody has to initialize them. Setting these values to zero is not the responsibility of the compiler -- at least, it is not in the GCC world. Instead, some start-up code, executed before main() is invoked, has to zero all this data.

I can only imagine that gcc assumes that, if the programmer sets a global variable to zero, it doesn't actually need to store the zero value in the executable. If it assigns it to a BSS segment, the start-up code will zero it along with all the other BSS data. And, frankly, neither the program nor the programmer usually care about the exact addresses in memory where data is stored. So this slightly odd behaviour potentially results in a modest saving of executable size, at least if large numbers of variables are involved.

It isn't an accident that gcc behaves this way. It turns out that, on some platforms, gcc has a specific switch to control this behaviour: -fzero-initialized-in-bss.

As I said, this is an oddity that will affect almost nobody. The default behaviour seems, well, wrong to me; but, unless you're implementing a program loader, you probably won't even notice.