Getting back into C programming for CP/M -- part 2

terminal prompt For reasons I've discussed elsewhere, I've recently become interested in using, and programming, CP/M again, after an interval of 40 years. I've even bought a real, Z80-based, CP/M machine to experiment with. Using real hardware mercilessly exposes poor programming practices, in a way that emulation cannot, as I will explain.

In an earlier article I pontificated about C programming for CP/M, mostly using the Aztec C compiler from the early 80s. In this article I describe my experiences with programming for CP/M using HI-TECH C -- a somewhat more modern compiler. It's from -- gasp -- 1989. I will explain why modernity is not necessarily a good thing, when it comes to retrocomputing.

HI-TECH C for Z80

HI-TECH is a more modern C compiler implementation than Aztec's offering. Version 3.09 (there was no later Z80 release, so far as I know) was released in 1989. Since then, a small band of enthusiasts have maintained it, and created bug fixes and improvements. Patched versions are widely available on GitHub and elsewhere.

HI-TECH supports an early ANSI-standard C. Unlike Aztec, it understands proper function prototypes, rather than the old K-and-R style. This relatively minor enhancement should not be underestimated, when it comes to adapting code written more recently. Sadly, HI-TECH still doesn't understand const as a parameter declaration. Apart from this, code written for HI-TECH can look like recognizably modern C.

Although there was a Z80 release of Aztec, I was never able to get it to work. The 8080 version produced code that ran on Z80, but I always had a sneaking suspicion that Z80 code would be smaller, or faster, or something. In practice, it doesn't seem to make a huge difference.

The HI-TECH build process

The Aztec C compiler had a simple, three-step build process: (1) compile to assembly language; (2) assemble; (3) link. The developer was expected to coordinate these steps manually, or use platform CP/M tools to chain the steps together. HI-TECH has a much more complex build pipeline: (1) preprocess; (2) compile to p-code; (3) compile p-code to assembly language; (4) optimize the assembly language; (5) assemble; (6) link.

Performing all these steps manually is a drag. Consequently, the compiler package contains a single meta-tool c.com that does all the steps in one pass -- as a modern C compiler would.

The problem, though, is that CP/M only has the most rudimentary means of allowing one program to invoke another -- this is a single-tasking system, after all. For one program to invoke another, the first program creates a file sub.$$$ and then calls the 'reboot' vector. When CP/M boots, it reads the file, and executes the command it contains. This command probably runs another program, which reboots CP/M, and so it goes on.

It's a crude, error-prone method of executing programs sequentially, but CP/M 2.2 never offered a better one. It was so crude, in fact, that some CP/M flavours did not support it (the one I use does not). CP/M emulators almost certainly don't support it. Thus, trying to carry out a complete build using c.com might not actually work, particularly on an emulator.

So if you want to speed up your application build by running the HI-TECH compiler on, say, a Linux system using a CP/M emulator, it might be more difficult that you expect. It's possible to get around this problem, by writing a Makefile that invokes all the HI-TECH tools in the right order on the right files, but this requires some ingenuity -- these tools were never designed to be run this way.

Comparison -- Aztec or HI-TECH ?

In my tests, I've found little to choose between the quality of output of the Aztec and HI-TECH C compilers -- on real, Z80 hardware running at 4MHz. Raw CPU speed -- such as it was -- is usually not the limiting factor in program performance on such a system anyway, as I'll discuss later.

Aztec is much easier to integrate into a modern-style build pipeline (using Makefiles, for example), than HI-TECH is. However, the fact that HI-TECH supports a more modern C language standard than Aztec is really the clincher for me.

Why we can't use modern methods, when programming for CP/M

Unfortunately, it's all too easy to be drawn into using modern programming paradigms with CP/M, and this rarely ends well. Using a comparatively modern variant of the C language might make this situation worse.

Consider a program that can write its output to the console, or to a disk file. The output is the same in either case -- only the destination differs. It's tempting -- for ease of programming if for nothing else -- to use the same code for both file and console output. After all, we're writing the same data. This is the outline of the relevant section of code:

  int handle; /* file handle */
  if (something)
    handle = open ("my_file", 1); /* Write to a file */
  else
    handle = 1; /* write to standard out */

  char *s = "lots of data to write...."
  for (i = 0; ...)
    write (&s[i], 1, handle);

Here we're using the write() call to send data either to a file or the console (standard out).

This is a 'modern' programming paradigm. But why?

Console output is inherently character-based - it works byte-by-byte. Disk output, though, is block-based. It's generally not even possible to write a file on disk byte-by-byte.

That it seems to be possible to do this follows from the Unix 'everything is a file' concept. In Unix (and, more recently, Linux) files are represented in the kernel as integer file handles. A file handle could refer to a disk file, or a printer, or a network socket, or whatever. This is an alien idea in CP/M -- CP/M has specific system calls for writing to the console, writing to a tape drive, writing to a file, and so on. There's no such thing as a file handle in CP/M.

Of course, the C programming language is inextricably tangled with Unix. C calls like write() use integer file handles because Unix does. In a CP/M C compiler, the C runtime library must take care of the mapping between Unix concepts and CP/M concepts.

So: how can we write one byte to a disk file in CP/M? Easy: (1) read the relevant sector into memory; (2) modify one byte in that sector; (3) write the whole sector back to disk. What this means is that each one-byte write to disk requires reading and writing an entire sector. It is not obvious, without careful testing and experience, that this is going on.

If we test our CP/M programs only on modern hardware using an emulator, the problems with complacent I/O programming are not really apparent. Modern disks are fast and, in any case, the kernel will buffer disk operations. The modern kernel won't really read and write a disk sector for a one-byte update -- it will work on a buffered copy of the sector in memory.

This demonstrates, I think, two crucial points.

First: we can't test a program designed for a real retrocomputer by running it under emulation. To be fair, we can test the basic logic, but we can't test how it will perform on real hardware.

Second: we can't program as if we were working on a modern platform, even if we're using the same programming language. C programming is tied to Unix in ways that are so entrenched that we don't even think about them. Once testing has revealed that our program performs badly on real hardware, we'll have to go back and think about the tacit assumptions that we made when writing it -- like the assumption that everything is a file.

Here are some other 'modern' C programming practices that will perform badly on retrocomputing hardware.

Using large automatic variables. For example:

void foo (void)
  {
  char buffer [2048];
  ...
  }

The problem here is not the size of the buffer (necessarily), but all the additional math that is required to index it, when its starting position in memory is not known until runtime.

Calling functions that might not do anything For example:
void log (char *format,...)
  {
  if (logging_enabled)
    {
    ...
    }
  }

log ("The value of x is %d\n", x);
...

The problem here it that function log() might not do anything; but whether it does or not is not known until runtime. Even if it does nothing, the program still has to put the function arguments onto the stack or into registers to make the call. This takes time and, if the function ends up doing nothing, that's time wasted. An alternative approach might be:

void log (char *format,...)
  {
  ...
  }

if (logging_enable)
  log ("The value of x is %d\n", x);
...

In fact, it's best to avoid function calls completely if that is remotely practicable -- every call has overheads associated with managing parameters. Of course, there's a trade-off here between performance and maintainability that we rarely have to worry about in modern practice.

Careless use of floating-point math. For example:

There's no sense in performing a floating-point calculation if the required result is as an integer. For example:

int speed = 100;
...
speed = (int) (speed * 1.20); /* Increase speed by 20% */
set_motor (speed);

This formulation is nicely readable -- multiplying by 1.2 is a natural way to express 'increase by 20%'. However, the same calculation can be done entirely in integers:

int speed = 100;
speed = speed * 120 / 100; /* Increase speed by 20% */
set_motor (speed);

This kind of thing can't be done haphazardly, though, because of the risk of an integer overflow. There are ways to perform integer calculations that avoids that risk, but they are somewhat specialized. Still, that's the fun of working on ancient hardware.

Unnecessary case conversions. CP/M is case-insensitive in most things. Command-line arguments passed to a program, for example, will be presented in upper-case, regardless of what the user typed. So, while it will often be appropriate in a modern application to process user input in a case-insensitive way, CP/M makes the choice of case for us.

I could go on. Oddly, I suspect that it's working in C that makes us (well, me) more likely to write sub-optimal program code for CP/M. I don't think I make the same kinds of mistakes in assembly language, and I suspect that's because I'm in the right mind-set.

The moral here, I guess, is: don't be fooled by inefficient modern development practices into thinking that the same techniques can be applied to 40 year-old hardware.