Did aliens really talk to us in "binary code" at Rendelsham Forest?

alien face Spoiler alert: no.

I have to confess to a fascination with the Rendelsham Forest UFO incident, widely regarded as one of the most significant UFO encounters of all time. It took place during my youth, at a place not far from where I lived. I've visited the site several times in the intervening forty years -- it's now a popular tourist destination, complete with a life-sized model of the alleged UFO. Last time I visited, the UFO was covered in graffiti and crude drawings of genitalia -- presumably not the work of aliens.

I don't know the explanation for the Rendelsham incident, but it's interesting whatever the explanation. Even if it's an outright hoax, it's fascinating for what it says about human psychology.

Of particular relevance to me, with my interest in coding theory and information technology, is Jim Penniston's notebook. It seems that, about thirty years after the incident, Sergeant (as he then was) Penniston revealed that his notebook contained sixteen pages of "1" and "0" digits that he says he received telepathically. Since then the notebook has been "decoded" and found to contain a cryptic message, in English. The message contains obscure phrases like "eyes of your eyes", in capital letters with no punctuation. Embedded in the message are the geographical locations of places of interest to believers in alien visitation, including the pyramids of Egypt. One snippet of text reads "origin date 8100", which has led some people to speculate that the message is a communication from the future, rather than from space travellers.

Over the last ten years or so, many people have commented on this message, but it's clear to me that few, if any, of these people have the first idea what "decoding" such a message amounts to. In general, to decode a message requires that we understand the scheme by which it was encoded, or are able to discern it. The scheme of encoding often says as much about the sender as the message text itself; that, I think, is very much the case here.

I firmly believe that the binary-encoded message in Mr Penniston's notebook is a hoax. I'm not saying that Penniston is the hoaxer; conceivably it was a hoax perpetrated on him, by persons unknown, with unclear motives. However, to understand why this is the case, it's necessary to understand something about how we encode text using binary numbers. What's striking to me is that the Penniston notebook is only slightly cryptic. The huge dump of binary digits initially looks imposing but, in fact, it's actually relatively easy to decode. However, it's only easy to decode in a very specific cultural context which, I think, is what makes the whole business suspicious.

In this article, I explain how to decode the Penniston notebook, with actual examples. With this understood, whether you believe the notebook is a genuine alien communication or not, you'll at least be able to argue coherently about it. Otherwise, the argument just turns on references to what various "experts" have claimed, which is never very convincing in a controversial subject like this.

I will set out my explanation from first principles, assuming no particular mathematical knowledge beyond basic arithmetic. If you're a computer programmer, you probably already know the background.

What "binary" actually is

In this context, "binary" is a system for representing numbers of any size using only the digits "0" and "1". There are potentially many ways to do this, but the easiest -- the most conceptually straightforward, at least -- is to use exactly the same approach that we do with our everyday decimal numbers. We can represent numbers of any size in decimal, even though we only have the digits "0" to "9" available. To represent larger numbers than 9, we use a "tens" column, which will give us numbers up to 99. For numbers larger than this we use a "hundreds" column, which will take us up to 999, and so on.

In the simple binary system, we don't use hundreds and tens -- which are powers of ten -- but twos, fours, eights, and so on; that is, powers of two. So the binary number "111" is one four, one two, and one one; that is, 7 in total. I'm going to call this method of writing binary numbers the "conventional" scheme, because it's the scheme that's used in a computer's memory, and the way we teach to computer science students. It isn't the only approach, and that's important, as I will later explain.

Binary numbering is popular in the computing field because it's comparatively easy to fabricate large-scale electronic devices that can work with voltages or currents that represent only the digits "0" and "1". Binary coding is so prevalent in IT that it's easy to forget that it is not, in fact, universal -- early computers using other methods of representing numbers.

The "conventional" system for representing numbers in binary -- the system that closely resembles the way we work in decimal -- is rarely used in communication systems. It's particularly inappropriate for long-distance communication over unreliable channels. The reason for this is that this simple scheme does not provide any error detection or correction mechanism. In practice, for communication, we need something more subtle. Determining efficient ways to do error detection and correction on streams of data is the key part of the branch of mathematics known as "coding theory". For our present purposes we don't need to worry about the details of coding theory, because the Penniston notebook uses "conventional" binary coding. Penniston's coding is actually easy to understand; the binary data used in practical communication systems usually is not.

From numbers to sentences

Computers are all about numbers. Much that is of interest to us, however, is not numeric. Therefore we need some way to represent as numbers things that are not inherently numeric.

A good example is text -- in English we use symbols like "A", "z", and "?", none of which naturally have numbers. Whether we will be representing text using binary numbers, or decimal numbers, or anything else, we must first agree on a numbering scheme.

A reasonable approach to developing such a scheme would be to start with A=1, B=2, C=3, etc. Some early computers did, in fact, do this. For any numbering scheme to be useful, of course, it must be widely adopted or, at least, agreed by all parties to the communication.

Probably the earliest character numbering scheme to become a kind of standard was EBCDIC, developed in the 1950s. This was soon overtaken by the ASCII system, which remains well-understood today. A full list of ASCII codes is available at asciitable.com -- we will have need to refer to these later.

The ASCII system provided numberings from 1 to 127 for English upper-case and lower-case letters, digits, and a smattering of punctuation symbols. The letter "A", for example, is 65. Why? It just is. It is technically irrelevant how textual characters are numbered -- all that matters is that everybody agrees on the numbering.

Although ASCII dominated text representation in computers for about forty years, it lacks any way to represent non-English symbols. Various methods were employed to handle European letters, but there was never a high degree of standardization. For about the last twenty years the dominant method for numbering characters is Unicode, which can accommodate most of the world's languages. Of course, Unicode requires much larger numbers than ASCII, simply because of the number of characters in its scope.

The binary code in the Penniston notebook

Mr Penniston's notebook contains sixteen pages of hand-written binary data -- strings of "1" and "0" characters. The digits start off with a reasonably consistent grouping into sets of eight digits, but this is not maintained. Generally, each line in the notebook contains three to twenty binary digits. It no longer seems to be possible to find scans of the actual notebooks online, although they are preserved in various printed books. The binarydecoder.info website provides a transcript of the notebooks which I believe is accurate (although I confess I have not checked every digit).

This is how the first page of the binary data begins:

01000101 01011000
01010000 01001100
01001111 01010010
01000001 010101000
...

If we assume that these are eight-digit binary numbers with "conventional" encoding, they convert to the following, ordinary decimal numbers:

69 88
80 76
79 82
65 84
...

The first binary number -- 01000101 -- equates to 69 because it consists of one 64, plus one 4, plus one one: 64 + 4 + 1 = 69. It's easy to find conversion utilities online, if you don't want to do the arithmetic.

To convert these numbers we must ignore the spacing in the notebook, which we take to be non-relevant. If we look up the ASCII symbols that correspond to these numbers (e.g., at asciitable.com) we find these letters:

E X
P L
O R
A T
...

Continuing in the same way, the first part of the "message" turns out to be "EXPLORATIONOGHUMANITY..." Yes, it's "og" humanity, not "of" humanity, although that's an error of only one binary digit (remember what I said about error detection? There is none, in this coding scheme).

The message fails after this point, because we end up with numbers that are greater than 128, and therefore not ASCII. It's reasonable to assume that errors have been made at some point in the transcription, and it's actually necessary to insert a half-dozen extra digits into the data, to end up with something that is convincingly ASCII text. There's no magic way to correct the data -- we just have to look for patterns in the binary data, and experiment until the decoded message makes some kind of sense. Of course, it's a vague kind of sense, with references to "eyes of your eyes" and the location of the pyramids, and so on.

It's not a difficult problem, though -- we've already seen that the message starts with upper-case English letters, and it's reasonable to assume that it continues in much the same way. In 8-digit groups, most English letters start with "010", and decimal numbers start with "001". Although there is no particular system to the way the binary digits are laid out in the notebook, these patterns are relatively easy to spot.

Problems with the notebook

On the face of it, everything makes a kind of sense. So why is the message not a genuine transmission from aliens? There are a number of evidential problems with Mr Penniston's notebook, but here I'm concerned only with problems related to information technology.

First, the binary data is encoded using a simplistic "conventional" number representation which, although widely used in computing, is rarely used in communication. That's because it does not allow for any error detection or correction. If an alien civilization or people from the future wanted to send us an important message, I can't help thinking that they would use a system of representation that is reliable. The fact that there are errors in the coding that we can detect, just from the context, has to raise questions about how many errors we couldn't detect. We can only correct "og" to "of" in the first page because "of" makes sense, and "og" doesn't. In the strings of coordinates that appear in the message, it's unlikely that we could detect errors reliably just based on context.

Second, the text is encoded as numbers using ASCII coding -- a system that had a measure of universality from the 1970s to the start of the 21st century. Other coding systems existed in the past, and others exist now. Julius Caesar would not have recognized ASCII, although he was familiar with manipulating text mathematically. ASCII is a 20th-century phenomenon. Moreover, ASCII is really only useful for English text, Which brings me to...

Third, the message is delivered in English only. English is widely spoken, but by no means universal.

Fourth, the binary digits are grouped into blocks of eight. This is a common thing to do when working on a computer, because memory comes in groups of eight binary digits (bytes). However, in the Penniston notebook, one of the digits in each group of eight is always a "0". ASCII coding only requires seven binary digits -- there are only 127 symbols, and 127 has the binary value "111 1111". But it's very wasteful in communications terms to use eight digits when only seven are required -- about 13% of the data sent is completely redundant.

Given that whomever encoded the message was prepared to waste one whole binary digit in every eight, it seems peculiar that the message has no punctuation, or even spacing. That can't be to improve transmission or storage efficiency -- as we've already seen that this isn't a consideration.

Sixth, geographical coordinates are represented using latitude and longitude, which is a highly particular way of coding a location. It relies on an understanding of where the Greenwich Meridian is, and that a degree is one 360th of a circle, among other things. Even among contemporary Western civilizations there's no exact agreement on the way in which latitude and longitude should be specified (because the Earth is, in fact, not a sphere). Would aliens understand these subtleties?

Finally, what about the year "8100"? Is that an AD/CE year? Or a Chinese year? Or something else? Different cultures use different calendars.

The reality is that the "message" in Mr Penniston's notebook is encoded in a way that was used in Western civilization with specific information technology at a particular point in time, using the English language. There's nothing universal about it, and the encoding is unreliable. I think that space aliens could do better.

So where does that leave us?

It's unusual that low-level concepts in communications and coding theory can be used to investigate claims of UFO phenomena; yet that seems to be the case here. All the evidence points to the conclusion that the notebook's binary data was created at about the same time it was presented to the public, in the same cultural context.

Cynically, we might even conclude that the 8-digit groupings presented at the start of the binary data -- and not continued thereafter -- were to give a strong hint that ASCII coding was implied. The first few numbers are very obviously binary-coded ASCII, to anybody who does any low-level computer programming. However, it takes some effort -- not a huge amount, but some -- to decode the rest of the message, using the assumption of ASCII set up at the start. If it was too easy to decode, we might argue, it would look like an obvious hoax. When I first "decoded" the message, I have to confess to feeling a sense of satisfaction, like solving a moderately-difficult crossword puzzle.

Whether you have the math skills -- and the patience -- to decode the message, I think it's important that anybody who refers to this message as a piece of evidence related to the Rendelsham incident, have some understanding of how it was encoded. This, I think, is more telling than the content of the message itself.