Your calculator has buttons for all sorts of special functions, like
square root, sine, cosine, and logarithms. How does the calculator
know how to calculate these functions? In arithmetic, you learned to
add, subtract, multiply and divide "by hand". With only those four
basic operations available, where would you even start if you had to
calculate a square root or transcendental function? Today we are going to
explore a little bit of
Numerical Analysis, which is the branch
of math that focuses on actually calculating things numerically, with
high accuracy and efficiency.
To be concrete, let's focus on the base two logarithm function, which
we will abbreviate L.
You can review
Logarithms
if you are a little rusty on the concept, but do not worry:
you won't have to calculate with them yourself. The whole point is to
look behind the curtain and see how the computer does it.
All we care about is that L is a curved function, not a
straight line, and that there is no simple formula for calculating it,
which makes it challenging.
Let's start with a graph of the L function. We will create it using R,
which (as regular readers will recall) is a free, high quality, open
source programming environment well suited to statistics and related
calculations. Read
Koch
Snowflakes for detailed instructions on downloading and
installing R. Now run the following code:
L <- log2
x <- 1:400/10
plot(x, L(x), type="l", cex.axis=1.5,
cex.lab=1.5, xlab="x", ylab="L(x)")
abline(0,0)
This produces the following graph.
Logarithms in any base are zero at x=1. They go down to negative
infinity as x approaches zero, as shown by the steep part at the left of the
graph. They grow without limit as x grows, but
very slowly, as
indicated by the relatively flat part toward the right side of the
graph. Recall that L(2^n) is n, so for instance, L(16) is 4, L(32) is
5, and L(64) is 6. Even L(1024) is just 10, so the curve grows ever
flatter as you move further right.
Now to the question of
computing L. Recall that the basic property
of logarithms is that they translate multiplication into addition. In
symbols:
L(a*b) = L(a) + L(b)
We can use this property of L to our advantage. All we really need to figure
out is how to compute L(x) when x is between 1 and 2. That's because
if x is outside that range, we can double or halve it repeatedly until
we get into that range; every time we do so, we adjust the answer up
or down by one.
More precisely, suppose x =
f*(2^n) where n is an integer and f is a number between one
and two. Then L(x) = L(f) + n. So all we really need to figure
out is how to compute L(f), where f is between 1 and 2. Adding 'n' is
easy because computers represent real numbers internally using
floating point, essentially storing both n and f directly.
If we redraw the graph of L over just the range from 1 to 2, it
looks much straighter. For comparison, the red line shows the
approximation y = x-0.95, which seems like a decent fit.
So as a first approximation, we could say that L(x) is roughly x-0.95, at
least on the interval from x=1 to x=2 that is our focus. However, that
is not nearly good enough: computer calculations with real numbers
typically are accurate to 16 decimal digits, so we should aim for
that as well.
However, this gives us an idea. If a straight line was a pretty good
fit, might we be able to find a quadratic that is an even better fit?
Indeed we can. In fact, calculus provides a useful tool here, called
Taylor series. A Taylor expansion is just a polynomial that is a
"good" fit to the function at a particular point. We can ask for a
linear fit, a quadratic fit, or even higher order polynomials. The
higher the order, the better the match, since the Taylor polynomial is
designed to match the function value, its slope, and as many higher
order derivatives (e.g. curvature) as it can based on its order.
Recall the formula for Taylor series from Calculus:
To use it, we need to know the value of the function and its
derivatives at a specific point x0. To avoid circularity, we assume
that we do not know how to compute logarithms yet, so we need to
choose x0 to be someplace where we know the answers in
advance. Choosing x0=1 works well, since log(1) is just zero.
Since we have to calculate derivatives, it is handy to switch to the
natural log function log(x) instead of the base two log L.
(Some people write 'ln' for natural log, instead of 'log'.)
Since
log2(x) = log(x)/log(2)
we can easily get back to log base 2 (or any other base), once we know
how to compute natural logs on values of x between 1 and 2.
Taylor polynomials tend to be most accurate close to the expansion
point x0, and less accurate further away. Since we want to expand
around x0=1, we will get better results if we shift the range of x
values to put x0 near the center rather than at an end. We know that
log(x) = log(x/sqrt(2)) + log(2)/2
so it is enough to be able to calculate log(x) for values of x between
1/sqrt(2) and sqrt(2), or 0.7 to 1.42.
Putting all these pieces together, the Taylor series for natural log
of x around x0=1 is
The first few terms are a good approximation near x=1.
The next chart shows log(x) in black, with the
quadratic x-1-(x-1)^2/2 in red.
The red line seems to be a very good fit except near the left and
right edges of the chart. It is good
enough that we need to switch to plotting the error, rather than
the curves themselves, in order to really see what is happening.
The next plot therefore shows the error, A(x)-log(x), where A is our
approximation, here the Taylor quadratic centered at x=1.
x <- 0.7+0.72*(0:400)/400
A <- function(x) { x-1-(x-1)^2/2 }
plot(x, A(x)-log(x), type="l", xlab="x",
ylab="A(x)-log(x)", lwd=3)
abline(0,0)
Now we can see that the error is smaller near x0=1 than far
away. Even at the edges of the graph it is never more than 0.02 in
size. Alas, this is still much too big to be practical.
However, we can go further and try a cubic Taylor polynomial. The next
chart adds a red curve to the previous chart. The red curve is the
error when we add the (x-1)^3/3 term to our approximation.
Clearly, the error is smaller, but not enormously so. This suggests
that to get errors as small as 10^(-16) will require using a lot of
terms in the Taylor series.
An easy way to compute Taylor polynomials is to use Maxima,
a free, high quality, open source symbolic mathematics program
described in detail
in
Calculus
Calculator.
The following code computes the 4th order polynomial around x0=1.
taylor(log(x), x, 1, 4);
string(%);
The first line computes the Taylor series. The second line
prints the polynomial in "input" notation suitable for graphing in R.
The result is
x-1-(x-1)^2/2+(x-1)^3/3-(x-1)^4/4
The problem is apparent: the coefficient to the (x-1)^n term is only
1/n, rather than 1/(n!). The symbol n! means n-factorial, i.e. the
product of the numbers from 1 to n. So for example, 5! is 1*2*3*4*5,
or 120. Factorials get big fast: 10! is 3,628,800. If the expansion
for log involved coefficients that went to zero like 1/(n!), it would
converge much more quickly. As it stands, 1/n gets small slowly, which
fits with our earlier observation that going from a quadratic to a
cubic did not help that much.
What if we use the first 10 terms, instead of a quadratic or cubic?
We can reduce the error even further, as shown in the next chart.
Now the vertical scale is 4e-06, which means 4 times 10^(-6), or
0.000004. That is a big improvement over the 0.02 we had before, but
it suggests that we will
still need a lot more terms before we
finally get to 10^(-16) accuracy.
Very high order polynomials with large numbers of terms can be numerically
unstable: small amounts of error in the coefficients can get magnified
into big errors in the result. So
adding a lot more terms might not actually meet our goal, though it would be
worth a try. Also, using lots of terms takes a long time to calculate, so if we could
find a more efficient process, that would be nice.
Is there something else we can try?
The inverse of the natural log function is the exponential function
'exp'. The Taylor series for exp converges much more rapidly, because
the coefficients scale like 1/(n!) instead of like 1/n. That suggests we
can use Taylor series to efficiently calculate exp. Once we have done
that, can we use it to somehow find natural logs?
Indeed we can.
Newton's Method is a powerful algorithm for
finding roots (zeros) of functions. It uses the linear Taylor
polynomial: since f(x) is approximately f(x0)+f'(x0)(x-x0), then if
f(x) is zero, we can solve for the root at x = x0 - f(x0)/f'(x0).
If the function f is not exactly linear, this new
value of x will not be exactly the root, but if x0 was a good initial
guess, the new x will be an even better approximation to the true root. This
means that if we iterate a few times, using
we should quickly reach a very accurate answer.
Suppose we want to calculate log(c) for some number c.
We choose f(x) = exp(x) - c. This function will be zero when x =
log(c). Applying Newton's method means using the iteration
xnew = xold - 1 + c*exp(-xold)
Try the following R code:
A <- function(c) {
x <- c - 1 # initial guess
for(i in 1:5) x <- x - 1 + c*exp(-x)
x
}
plot(x, A(x)-log(x), type="l", xlab="x",
ylab="A(x)-log(x)", lwd=3)
Starting from c-1 as our initial guess and iterating 5 times yields an answer
accurate to 10^(-16), just as we had wanted, as shown in the following chart:
The jagged, noisy pattern means we have reached the limit of accuracy
on this computer: all that is left is random-looking
round-off error due to
the finite nature of computer arithmetic.
To achieve this result, we did need to call the exponential function
five times. That could be expensive, so we should examine how many
terms we need in the Taylor series for 'exp'. Let's check the error
when we use a 10th degree polynomial.
The Taylor expansion for 'exp' is simplest if we expand around x0=0,
so we will once again shift the range of interest.
Since exp(-x) is just 1/exp(x), and since exp(x-1) is exp(x)/e,
we can now focus on the range from x = -0.3 to x = 0.42, assuming we know
the value e = 2.718282... accurately already.
We are pleased to see errors in the 10^(-12) range, a great
improvement over the 10^(-6) we had with the log function at ten
terms. Why the difference? Look at the Taylor series for exp:
Notice the coefficients. The 24 is 4! and the 120 is 5!, so we know
that by the tenth term, the coefficients are getting very
small. Therefore, dropping the eleventh and higher order terms does
not introduce much error. As a result, we can get a highly accurate
approximation to 'exp' just by using a few more terms in this series.
We have used graphs of numerical experiments to assess the accuracy of
our approximations. This is of course somewhat circular: if my
computer did not already have a working version of the log function, I
would not have been able to make these graphs. The
mathematical
side of Numerical Analysis, therefore, consists of proving inequality
bounds on approximations. For example, using Taylor's theorem in the
form that includes a "remainder" term allows us to put an upper bound
on how large the error can possibly be, even if we do not yet know how
to calculate log or exp. That breaks the circularity and gives us (or
computer designers, anyhow) peace of mind.
If you liked this article, you may also like
System
Dynamics: Feedback Models, or check out
the
Contents
page for a complete list of past topics.
Please post questions, comments and other suggestions using the box
below, or email me directly at the address given by the clues at the
end of the
Welcome
post. Remember that you can sign up for email alerts about
new posts by entering your address in the widget on the sidebar. If
you prefer, you can follow
@ingThruMath on
Twitter,
where I will tweet about each new post to this
blog. The
Contents page has a complete list of previous articles in
historical order. Now that there are starting to be a lot of
articles, you may also want to use the 'Topic', 'Search' or 'Archive'
widgets in the side-bar to find other articles of related interest.
I hope you enjoyed this discussion. You can click the "M"
button below to email this post to a friend, or the "t" button to
Tweet it, or the "f" button to share it on Facebook, and so on. See
you next time!
Do you know whether this is exactly how calculators and computers typically compute logarithms (except perhaps with a different root finding method), or is the reality more complicated?
ReplyDeleteWhatever the answer, it was a nice, clear write-up!
Thanks, I am glad you liked it. As to your question...
ReplyDelete... with a lot more detail than you probably wanted ...
The approach will presumably vary by manufacturer. On Linux computer systems, most programs that use logarithms ultimately call the GNU Standard C library "glibc", which implements a wide variety of standard functions. The source code is available for browsing on the web, since it is open source, so you could find out *exactly* how they do it.
MS Windows is closed source, so it would be hard to find out what it does or even how accurate it is: computations on Windows and Linux sometimes give different results, possibly due to differences in the implementation of special functions like logs.
There are (or were) programs (I think one was called 'paranoia') that attempt to push floating point math to the limits to check the accuracy of an implementation. Back in the day they focused on arithmetic, but since the IEEE floating point standard became widely adopted, you can usually count on arithmetic to work correctly. Special functions, however, are less standardized. Generally you want a function that returns the closest floating point number to the true answer no matter where in its range it is called, but this is difficult to guarantee while still being efficient. (There is also the possibility of using fixed point arithmetic, or interval arithmetic, or arbitrary precision arithmetic, as alternatives and as ways to assess errors).
On some CPU architectures there might be an instruction to compute logs or other special functions, but you'd want to read the assembly language manual to see what sort of accuracy it guarantees, if any, and if it exists.
The log function is simpler than some because we can focus on just the range from 1 to 2; with something like a trig function, you have to try to *accurately* remove multiples of 2pi before you can even begin to use something like a Taylor series.
The real algorithm in a particular device might center around a different expansion point or use a different number of terms. There could be some other trick that avoids having to call the 'exp' function, if one is sufficiently clever; or that makes calls to 'exp' more efficiently; I'd be interested to hear about such if anyone knows...
So, all in all, I'd be surprised if any particular calculator or computer uses this exact approach in all its literal glory; nonetheless, I would also be surprised if they take a radically different approach.
Logarithms and other special functions are "hard" to compute, in the sense that you can't just express them in terms of a simple linear or quadratic function. Series expansions, such as Taylor, and root finding methods like Newton, are useful tools, but are not the only tools. For some special functions (such as the cumulative normal probability distribution) there are some clever approximations based on the *ratio* of two polynomials - see Abramowitz & Stegun for details. Many special functions arise as solutions to differential equations, which means that numerical methods for solving differential equations are another potential approach. The field of Numerical Analysis has a variety of these sorts of tools and algorithms available. But ultimately, they all have this sort of "flavor" - using approximations, and paying attention to the accuracy of those approximations, and refining them (by iteration or by adding more terms) until they become accurate enough, while at the same time trying to be efficient, since these functions often wind up being called enormous numbers of times inside the loops of complicated programs.
For ln(x) I would use approximation by 2*ln(sqrt(x)) recursively. When x is close enough to 1, then ln(x) is slightly less than x-1. Regarding sin(x), there is formula for sin(3x) in terms of sin(x) polynomially. For small x sin(x) is around x. cos(2x) can be also expresed through sin(x), reducing cos to sin.
ReplyDelete