Converting a floating-point number to a fraction (approximately) using continued fraction expansion

integral sign

This article describes how to convert a double-precision floating-point number to a fraction, within a specified tolerance. This is a commonplace operation, in computing operations and in day-to-day life. For example, in carpentry and construction it's often helpful to know that pi can be approximated to two decimal places as 22/7, or to five decimal places as 355/113. Both these numbers can be used in back-of-the-envelope calculations (or even in your head, if you have good concentration) in a way that "3.1415926" can't. It's often helpful to rationalize the result of a calculation, because it can give an insight into the meaning of the result that a decimal number to ten decimal places can't.

Algorithms and code samples for rationalizing a decimal number abound. However, they are rarely well-explained, so limitations in their use are not apparent. In this article I will describe the use of continued fractions to rationalize a number, and I will explain it from first principles.

The code examples in this article are in C, but I think the basic principles should be comprehensible to people familiar with other programming languages. See the Download section at the end of the article for a link to the source full source code for the examples.

The problem

Given a decimal number (say, 1.625) convert it into a fraction (1 5/8), or an improper fraction (13/8). The conversion should yield a fraction with the smallest total number of digits, within the bounds of a particular precision.

Basic principles

A real number (in the mathematical analysis sense) cannot necessarily be converted into a fraction, except by accepting some inaccuracy. For example, the number sqrt(2) is irrational -- by definition, it can't be rationalized.

For a floating-point number stored in a computer, this is not true. A floating-point number can always be rationalized, because there is only such much precision in the computer representation.

For example, the number "0.124456432" can be rationalized very easily as 124456432 / 10000000000. The problem is that this probably isn't a very useful rationalization -- it doesn't tell us anything new. On the other hand, with a precision of 8 digits, the rationalization is 2261 / 18167. Whether this is a more useful representation depends, I suppose, on whether either the numerator or denominator are recognizable in their own right.

However, if we rationalize with a three-figure precision, we see that 0.124456432 is, in fact, 1 / 8. If the decimal number came from a calculation involving measurements, it's quite likely that "1 / 8" is the "correct" answer, and all the other digits are just from the measurement error.

So this problem really isn't just a matter of rationalizing a decimal number -- which is trivial with floating-point numbers; it's about finding the most appropriate representation: that is, the rationalization that uses the smallest number of digits that is compatible with a specified precision.

To perform this task we will use continued fractions. We will find the smallest set of continued fraction coefficients that represents the original number (within the specified precision), and then convert that continued fraction to an ordinary fraction. Both these processes are conceptually straightforward, but there are complications related to computational accuracy that have to be addressed.

Continued fractions

A continued fraction is a fraction of this form.

    a + 1
        -----------
        b + 1
            -------
            c + 1
                ---
                ...

The concept appears to have been developed by ancient Greek mathematicians, for specifying numbers of arbitrary precision. For some reason, the modern concept of a decimal number seems not to have occurred to them; or perhaps was not considered interesting enough.

The fraction can continue indefinitely, to provide whatever precision is required. The terms a, b, c, etc., are referred to as the coefficients of the fraction. To save space, these days a continued fraction is usually written in a compact form:

[a; b, c, ...]

There are well-developed methods to convert a decimal number into a continued fraction; and there are well-developed methods to convert a continued fraction to a normal fraction. So, to convert a decimal number to a normal fraction, we can first convert to a continued fraction, and then to a normal fraction. In practice, we'll do these two steps together in a loop, to avoid the need to store any intermediate results.

Converting a decimal number to a continued fraction

The algorithm for converting from a decimal number to a continued fraction is simple, but let's see a specific example first. Let's convert 1.75 to a continued fraction. The final result will be:

   1 +  1
         ----------
         1 + 1
             -
             3

In the more compact representation, we might write this:

[1; 1, 3]

Note that this representation (in decimal arithmetic) is exact. If we tried to find any more coefficients, they would all be zero. We only need three coefficients to represent 1.75 precisely. In practice, though, we will often want to stop the extraction of coefficients when the result is already within the specified precision. Remember that the problem to be solved amounts to finding the smallest numerator and denominator, with a particular precision.

The first coefficient of the continued fraction is is 1 -- just the whole-number part of 1.75. So we extract that as the first coefficient, leaving 0.75. Now we want to represent 0.75 as a fraction and, in the continued fraction formulation, the numerator is always 1. So:

  0.75 = 1 / denominator;
  denominator = 1/0.75 = 1.333...

Notice that the result is a recurring decimal in base 10 (and might be in other number bases as well). It can only be represented approximately. The errors involved in doing repeated computations on numbers with limited precision is something I'll come back to later.

Anyway, the second coefficient is 1, the whole-number part of 1.3333..., leaving 0.333.... Continuing:

  0.333 = 1 / denominator;
  denominator = 3.000

This denominator is a whole number, 3, which means that we're done, and the final coefficient is 3. The C code that most closely represents the procedure I described above is the following, using floating-point arithmetic throughout. The variable a represents the extracted continued fraction coefficient.

  double x = ... // Number to be converted
  double a = floor (x); // First coefficient
  while (x - floor (x) < something)
    {
    x = 1 / (x - a);
    a = floor (x); // Subsequent coefficients
    }

The "something" in the code above is a term that represents the precision we're aiming for, and will usually be a number that is much smaller than the number we're converting.

Converting the continued fraction to a normal fraction

We could store the continued fraction coefficients in an array, and use any of the well-documented methods to convert from a continued fraction to a normal fraction. However, we don't need to store them, because the method of "left-to-right expansion" of the continued fraction allows us to build the final result incrementally, working through the coefficients from left to right. Since we are generating them from left to right, we can do the expansion at the same time we generate the coefficients; no need to store anything but the last few values.

To understand how left-to-right expansion works, let's write some sequences of coefficients in full, and compare them.

   [a]     a
   [a;b]   a+1/b, or (ab +1)/b
   [a;b,c] a + 1/(b + 1/c), or a + c/(bc + 1), or (c(ab + 1) + a)/(cb + 1)

I've written the expansions in such a way that each is expressed in terms of a numerator and a denominator. There is a pattern here.

Notice that the expansion of [a; b,c] has a numerator c(ab + 1) + a and a denominator cb + 1. Each of these terms can be made from terms in the previous continued fraction expansion. That's the key and, if you're anything like me, you'll have to look at the formulation for a while before the "aha!" moment. As we increase the length of the expansion, each new expansion needs only the current coefficient, and the numerators and denominators from the previous two expansions.

That is, in any iteration of the loop shown in the code snippet above, if we call the previous numerator and denominator n1 and n2, and likewise d1 and d2 for the denominators, then the current approximations for the numerator and denominator are simply:

   n = n2 + a n1
   d = d2 + a d1

This is, to my mind at least, a surprisingly elegant and helpful formulation, and one that I think is not immediately obvious from looking at how continued fractions are formulated.

So, as we loop around the expansion of the original number into continued fraction coefficients, we shift the n, n1, n2 terms "to the left", building up a better approximation of the normal fraction as we go, using the two expressions immediately above. Of course, the addition and multiplication of positive numbers ensures that the numerator and denominator can only get bigger with each new iteration; but that is what is expected: to get the increasing accuracy, we need to use larger numerators and denominators.

The source code for this floating-point implementation is in the file rationalize.1.c in the source code bundle.

What's wrong with this method?

The method described above is adequate for most practical purposes. If you want to convert a number retaining, say, 3-5 digits of precision, it's fine. For more specialist purposes, however -- if you wanted 8-figure precision, for example, there's a problem. Well, two problems, but they have the same cause.

The problems stem from the repeated use of floating-point numbers in calculations. Each time a calculation is performed, some accuracy is lost. Since the method is iterative, and results from one iteration become the input to the next, the errors accumulate. In addition, floating-point math is slow. Of course, "slow" is a relative term and, on a modern computer, the floating-point method will be fast enough, unless you have to convert millions of numbers.

What follows, therefore, might be academic. I'll describe how to implement the algorithm entirely using integer math.

Doing continued fraction expansion in integers

Clearly, the mathematical formulation above makes no sense in integer arithmetic. We can't meaningfully take the reciprocal of an integer, or apply the floor() function to it.

What we can do, though, is to apply a scaling factor to the number to be converted, and then do all the subsequent math on the scaled number. Let's say we scale by 10,000 (but there's nothing special about this number). What is the equivalent of floor(x) in this formulation?

If we think of "floor" as "divide by 1 and discard the remainder", we can see that in the scaled system, we can replace:

    double x = ... // Number to be converted
    double a;
    a = floor (x); 

with

    inte64_t x = // Number to be converted * 10000
    int64_t a;
    a = x / 10000; 

The "discard the remainder" part of the operation is implicit when we do integer division. What about the reduction operation

    x = 1 / (x - a);

in the floating-point example? In principle, the scaled equivalent ought to be:

    x = 10000 / (x / 10000 - a)

That won't work, however -- x / 10000 could evaluate to zero in integer arithmetic, losing all precision. So we must scale twice here, to ensure that x will never be less than 10000. This gives us:

    x = scale * scale / (x - a * scale); 

Note that the continued fraction coefficient a is not scaled -- it's still going to be a number less than 10.

There's nothing special about my choice of 10,000 as the scaling factor. However, it's not arbitrary, either. If we want to get a result with five figures of precision, 10,000 is a good scaling factor. Suppose the number to be converted is "3.1415926535...". When scaled and rounded to an integer this gives us "31415". We don't need to worry about the loss of rest of the fractional digits, since they are below the specified precision, anyway. The scaling factor needs to be sufficiently large that any digits we need from the original fraction end up in the integer.

In the complete source code, rather than hard-coding a scaling factor, I've allowed the user to specify the order of the conversion (that is, the approximate number of digits of precision), and then calculated the scaling factor:

    int64_t scale = pow (10, order); 

This is really the only floating-point math operation in the program, and it could be replaced by a multiplication loop to give an all-integer-math solution.

There is another point to note about the integer implementation. If we have selected, say, five digits of precision, then we never need more than five continued fraction coefficients. So, rather than expanding the continued fraction until we reach the required precision, we just expand five coefficients. We'll still need to test whether the required precision has been reached, because that might happen before the whole quota of coefficients has been generated.

It's worth bearing in mind that the all-integer representation can't ever work with more than nine digits of precision. That's because the term scale * scale in the computation described above will be too large to fit into a 64-bit integer. The method might fail with fewer digits, too -- if the input number is large enough, there might still be an arithmetic overflow.

The source code for this "integer scale by power of ten" method is implemented in the file rationalize.2.c in the source code bundle.

Can we use powers of two instead of powers of ten?

In the integer-only example I described above, I suggested making a fractional number into an integer with sufficient significant figures simply by multiplying by a power of ten (10,000 in the example). But what's special about powers of ten? Nothing really -- it's highly unlikely that the computer's floating-point representation will be in base 10. We could have used anything for the scaling factor, so long as it was large enough.

If we choose a power of two for the scaling factor (e.g., 65536, or 2^16), then we can re-cast most of the multiplication and division operations as shift-left or shift-right operations. A shift-left of three places, for example, is equivalent to multiplying by eight.

The advantage of working this way is that shift operations are hugely less computationally expensive, and thus much faster, than multiplications and divisions. We can also, if we're very careful, design an algorithm that allows high levels of precision, by avoiding the need to use very large scaling factors. The problem with very clever implementations is that they tend to be machine-specific.

The problem with using a shift-based algorithm -- other than the tendency to be machine-specific -- is that it's difficult to specify the required precision in human-comprehensible terms. Everybody knows what it means to get an answer to three significant figures in decimal arithmetic; it's much less obvious what a precision in terms of binary bits amounts to.

Download

Full C code for the examples used in this article is available from my GitHub repository.