I'm going to try to explain the example of estimating the derivative of $f(x)=e^x \sin x$ at the point $a=2.2$.

Here we are using the estimate $$f'(a) \approx \frac{f(a+h) - f(a)}{h}.$$ The main difficulty is figuring out approximately what $h$ is best. There are two primary sources of error:

- Loss of precision due to subtracting numbers at a similar scale
- Mathematical error in our approximation (the difference between the actual values of $f'(a)$ and $\frac{f(a+h) - f(a)}{h}$)

There is a trade off here because if we were doing exact arithmetic, we'd be best off choosing $h$ very small, but choosing $h$ very small will cause much more loss of precision. We analyze these two errors separately below, and then combine the analyses to decide which $h$ is best.

This example was discussed in the Numerical Differentiation Notebook, but I hope this discussion will more clearly explain why $h \approx 2^{-27}$ is best.

We are using floating point numbers which have $53$ bits of precision. We are interested in the loss of precision in computing $f(a+h)-f(a)$. But a basic principle is that if $f$ is a nice function (differentiable with continuous derivative), then the loss of precision in computing $f(a+h)-f(a)$ will roughly match the loss of precision in computing in $(a+h) - a$.

To simplify things, we'll assume that $h=2^{-k}$ with $k \geq 0$ an integer. (We could more generally assume that $2^{-k-1} < h \leq 2^{-k}$ and the analysis would be basically the same.)

Because $a=2.2$ satisfies $2^1 \leq a < 2^2$, its most significant bit is the place value $2^1$. Then the numbers $a$ and $a+2^{-k}$ will agree (roughly) in place values $2^1, 2^0, 2^{-1}, \ldots, 2^{1-k}$ but disagree in place value $2^{-k}$. So they agree in $k+1$ bits. So, by our assumption, $f(a+h)$ and $f(a)$ also agree in about $k+1$ bits. This means that computing $f(a+h)-f(a)$ results in the loss of $k+1$ bits of precision. This means we are left with $53-(k+1)=52-k$ bits of precision.

We can convert this to an estimate of the error. We know $f'(a)$ is between $1=2^0$ and $2=2^1$. So, the most significant bit is in the $2^0$ place value. We have $52-k$ bits of precision, and our error is on the order of the place value of the least significant bit of precision. Since $2^0$ is the place value of the most significant bit, $2^{k-51}$ is the place value of our least significant bit of precision. So, we have estimated that the error in our computation due to loss of precision is $$2^{k-51}.$$

The absolute mathematical error in our estimate is $$\left|f'(a) - \frac{f(a+h) - f(a)}{h}\right|,$$ assuming we are doing exact arithmetic.

We can understand this error through Taylor's Theorem. Applying the Taylor's theorem for the degree one Taylor polynomial gives that $$f(a+h) = f(a) + f'(a) h + \frac{f''(y)}{2} h^2$$ for some $y$ between $a$ and $a+h$. (The first two terms are the degree one Taylor polynomial, and the last term is the Lagrange form of the remainder.)

Solving this expression for $f'(a)$ yields $$f'(a) = \frac{f(a+h)-f(a)}{h} - \frac{f''(y)}{2} h.$$ So the absolute error in our estimate is $$\left|f'(a) - \frac{f(a+h) - f(a)}{h}\right|=\left|\frac{f''(y)}{2} h\right|,$$ for some $y$ between $a$ and $a+h$. Since the interval $(a, a+h)$ is small and the second derivative is a continuous function, the second derivative doesn't vary much on the interval. We can just assume that the second derivative is a reasonable number (it works out to be about -10, but we don't need this level of detail).

So, the mathematical error is about $$ch = c 2^{-k}$$ where $c$ is a reasonable positive number. (We actually have $c \approx \frac{f''(a)}{2} \approx 5$ in this case.)

The best choice for $h$ is the value which makes the mathematical error and the error due to loss of precision about equal. (If the mathematical error is larger than the error due to loss of precision, then we can shrink $h$ and get better results. If the error due to loss of precision is larger, then we should grow $h$ to get better results.)

So we are looking to choose a value of $k$ which makes $$2^{k-51} \approx c 2^{-k}.$$ Simplifying, we see that $$2^{2k} \approx c 2^{51}.$$ Then taking a log to base $2$, we see that $$2k \approx 51 + \log_2 c$$ and $$k \approx 25.5 + \frac{1}{2} \log_2 c.$$ Ignoring the term $\frac{1}{2} \log_2 c$ because it is small, we can guess that the best choice for $k$ would be $k=25$ or $k=26$.

*Remark:* We can do a bit better if we knew that $c \approx 5$, because $\log_2 c \approx 2.3$ is then slightly bigger than $2$. So, we get the better approximation
$$k \approx 25.5 + \frac{1}{2} 2.3 \approx 26.6,$$
which rounds up to $27$, the best value we found in the Numerical Differentation notebook. While this might seem unfair (unsing the second derivative $f''(a)$ to estimate $f'(a)$), we certainly don't need an accurate estimate of $c$, only the scale of $c$ ($\log_2 c$) plays a role, so even a poor estimate of $f''(a)$ could be useful.

Let's just assume we are using $k=27$ so $h=2^{-27}$ and that we know that with this choice we have at most $52-k=25$ bits of precision. We have a mathematical error of similar size, so it is more reasonable to assume we have $24$ bits of precision (i.e., we loose another bit through mathematical error). We can convert from bits of precision to digits of precision by dividing by $$\frac{\log 10}{\log 2} \approx 3.3,$$ so we can expect to have about $7$ digits of precision. We will check that this is correct.

In [1]:

```
# Our function and its derivative:
import math as m
def f(x):
return m.exp(x) * m.sin(x)
def df(x):
return m.exp(x) * (m.sin(x) + m.cos(x))
```

In [2]:

```
# The actual derivative
a = 2.2
der = df(a)
der
```

Out[2]:

In [3]:

```
# Compute our estimated derivative
k = 27
h = 2**-k
est = (f(a+h) - f(a))/h
est
```

Out[3]:

Comparing the numbers above, we can see they agree to 8 digits, a bit more that our estimated 7 digits.

I believe the following is a general observation you should take away from this:

- When $a$ and $f''(a)$ are not very large, a good choice for estimating the derivative with the formula using floats $\frac{f(a+h) - f(a)}{h}$ is $h=2^{-26}$. In greater generality, if you are doing computation with $N$-bit floating point arithmetic, then $h=2^{-N/2}$ would be a good choice.*

If $a$ or $f''(a)$ are very large, you might have to think harder about what the best choice of $h$ would be.