This treatment loosely follows the treatment in TAK, Chapter 2.
Mathematical topics covered:
Programming concepts covered:
assert
for checking inputtry
statementfloat128
typempf
.In Python a floating point real number is of type float. You can define a float using a number including a decimal. They are also the type of numbers that arise from (standard) division even if the quotient is an integer. You can test if a number is a float by evaluating its type, e.g., type(num)
.
type(3.14)
type(2/1)
type(2//1)
from math import pi
type(pi)
Floats in Python are floating point numbers following the IEEE Floating Point Standards. We will learn what this means. First we should review numbers in other bases.
Let $\beta \geq 2$ be an integer. Then we can represent each non-negative integer by a finite sequence of numbers in the set $\{0,1,2,\ldots,\beta-1\}$. Namely, if $(a_n,a_{n-1}, \ldots, a_0)$ is a finite sequence with each $a_i \in \{0,1,2,\ldots,\beta-1\}$, it represents the number $$\sum_{k=0}^n a_k \beta^k=a_n \beta^n + \ldots + a_2 \beta^2 + a_1 \beta + a_0.$$ Numbers are ordered like in decimal in order from most signficant to least significant. When writing numbers in every day life, we assume that the most significant number in the sequence (i.e., $a_n$) is non-zero. Often computers store numbers using sequences of a fixed length, and in this case the most significant number stored can be a zero.
There some important special cases:
Examples The number $14$ has binary representation $1110$ since $14=8+4+2$.
In base 16, we represent the numbers $\{0,\ldots, 15\}$ by switching to letters once we run out of digits. So, we use
$$\{0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f\}$$
with $a=10$, $b=11$, $c=12$, $d=13$, $e=14$ and $f=15$. So for example, $415$ would be represented as 19f
since
$$415 = 1 \cdot (16)^2 + 9 \cdot 16 + 15.$$
Python has built in support for converting integers into hexadecimal through the hex
function. Observe:
hex(12)
hex(415)
hex(-43)
hex(0)
Note that the output of hex
is a string. The first character is -
if the number is negative, and then the subsequent two characters are 0x
. In the case of a non-negative number the first two characters are 0x
. The 0x
is there to indicate that we are dealing with a hexadecimal number.
Python also accepts integers input in hexadecimal as illustrated below.
0x25
-0x1f
To convert a string holding a hexadecimal representation to an integer, you simply need to get Python to evaluate the string as input. This can be done with the eval
command as shown below.
eval('0x24')
eval('-0xff')
In particular, for any integer n
, the operation eval(hex(n))
returns n
.
eval(hex(23))
Binary numbers can be worked with in a similar way to hexadecimal numbers. Python has a built in function bin
which converts to binary. For example:
bin(11)
The above output makes sense since $11=8+2+1$. The following checks the behavior of negative numbers and zero.
bin(-11)
bin(0)
Again, integers can be entered in binary.
0b1011
-0b1011
eval("0b1011")
Python doesn't have a built in way to convert from an integer to other bases and back. We will do it by hand in base $3$, and learn some basic programming skills in the process.
A base $3$ is also called a ternary representation. Digits in ternary are known as trits. They take the value 0
, 1
, or 2
.
First lets discuss an example. Lets think about how we would find the ternary representation of $64$ as part of a repetitive process.
1
, because $64=3\times 21+1$. The remainder of the expansion coincides with the expansion of $21$.0
, because $21$ is a multiple of three. In fact, $21=3 \times 7$, so the remainder of the expansion coincides with the expansion of $7$.1
, because $7=3\times 2+1$. The remainder of the expansion coincides with the expansion of $2$.2
. Since $2=3 \times 0 + 2$, we can stop.From the above we see that the ternary expansion of $64$ is 2101
. We have found the trits from right to left following the bullets above.
In particular observe that we can find the rightmost trit of a non-negative number using modular division by $3$. For example:
64 % 3
Note that we will want to output a string instead of an integer, so we will make use of the str
function to convert an object or number to a string.
str(64 % 3)
This gives the right-most trit. The remainder of the trits give the ternary expansion of the number obtained through integer division by $3$. Namely,
64 // 3
We can implement this repetitive process using a while loop. A while loop has the syntax.
while condition: statement1 statement2
The while command as above repeatedly runs the indented statements until condition becomes false.
Here we make use of the while statement to carry out the above computation of the trits for $64$.
n = 64
while n != 0: # Run the following loop until n becomes zero.
trit = n % 3
print(f"Found trit is '{trit}'.")
n = n // 3 # Update the value of n.
print(f"Changed n to {n}.")
We have used print statements above to keep track of the values of the variables in the loop. We can see for instance when we order the trits from right to left, we get 2101
, which is what we wanted.
We can form 2101
by repeated concatenations of strings. We concatenate strings with the +
operator. This was discussed earlier. For example 'abc'+'def'
will return abcdef
.
Below, we take what we did above and include it in a function. We also perform the concatenation of trits using string concatenation.
def ternary(n):
rep = '' # This will store the ternary representation.
while n != 0:
trit = n % 3
rep = str(trit) + rep # add the new trit on the left.
print(f"Our current representation is '{rep}'.")
n = n // 3
return rep
ternary(64)
From the output above, we can see we have correctly concatenated. The output above 2101
is the ternary expansion of $64$ as desired.
It would be good to remove the print statement in the final version. But there are some other issues with the code.
What happens if $n$ is zero? What about if $n$ is negative?
In the negative case, $n$ will stay negative! This means our while
loop will never terminate. I attempted to run it anyway and saw the following error:
I had to go to the menu entry Kernel > Interupt to get execution to stop.
To avoid the possibility of this, we will test our input before entering the while
loop. The assert
statement is good for this. The syntax is
assert condition, error_stringThe error_string is printed and execution is stopped if the condition is false. If condition is true, then execution continues unhindered and nothing is printed.
assert 1+1 == 2, "But, 1+1==2 should aways be true!"
print("I guess one plus one is two.")
assert 1+1 == 3, "Error: 1+1 is not three!"
print("I guess one plus one is three.")
We can check if n
is an integer with the condition type(n) == int
and check if n
is positive with the condition n >= 0
. Including corresponding assertions gives the following function:
def ternary(n):
# Below we peform sanity checks on the input.
assert type(n) == int, 'n must be an integer'
assert n>0, 'n must be a positive integer.'
rep = '' # This will store the ternary representation.
while n != 0:
trit = n % 3
rep = str(trit) + rep # add the new trit on the left.
n = n // 3 # Update the value of n.
return rep
Here we test that it works.
ternary(64)
ternary(5)
ternary('dog')
ternary(-4)
Because Python supports arbitrarily large integers, our ternary
function works with very large input.
print(f'2^100 is {ternary(2**100)}')
print(f'2^100 in ternary is {ternary(2**100)}')
Why would the ternary expansion be approximately twice as long as the decimal expansion?
Let $\beta$ be a base, i.e., an integer greater than or equal to two. Then, just like in the decimal system every real number can be expressed as an infinite sum of the form $$x = \pm \left(\sum_{k=-\infty}^0 a_k \beta^k\right) \times \beta^n,$$ where each $a_k \in \{0,1,\ldots, \beta-1\}$. We also assume that $a_0 \neq 0$ except in the case that $x=0$.
This representation of $x$ consists of a sign ($\pm 1$), the mantissa $\sum_{k=-\infty}^0 a_k \beta^k$ and the exponent $n$.
A simplified and not entirely correct description of a how a computer stores double precision floatting point numbers follows. Base $2$ is used, so data is stored in bits. The sign takes up one bit. Some number of bit say $K$ (often $K=52$) are used to store the mantissa, which then has the form $\sum_{k=-K}^0 a_k \beta^k$. Note that because $a_0=1$ (unless $x=0$), this bit does not need to be stored. Then some number of bits say $E$ (e.g., $E=11$) are used to store the exponent which means that roughly the exponent $n$ satisfies $$-2^{E-1}+2 \leq n \leq 2^{E-1}-1.$$
Note that there are only $2^E-2$ values available for $n$, which leaves two values available for handling special values, including zero, infinity, and "not a number".
Large and small floating point numbers are printed in scientific notation.
from math import pi
pi * 10**50
Here the end e+50
should interpretted as $\times 10^{50}$. This number is not exact. For example, even though $1/5$ has an exact representation in decimal as $0.2,$ it does not have an exact representation as a float. This explains:
1/5 + 1/5 + 1/5 == 3/5
1/5 + 1/5 + 1/5
Remark 1. The most important thing to take away from this discussion of floats is that rounding is performed in every step, and only rarely is a computation exact. It is important to be aware that errors can accumulate in repetitive computations.
Remark 2. A safer way that checking for equality of two numbers is to check if the numbers are very close. To compute the distance between two numbers $x$ and $y$, you can use the absolute value of $x-y$. In Python this is abs(x-y)
. So, to check that 1/5 + 1/5 + 1/5
is very close to 3/5
we can do:
x = 1/5 + 1/5 + 1/5
y = 3/5
abs(x-y) < 10**-10
Human's aren't so great at reading numbers in binary. Too many ones and zeros give us headaches. For example: $$\frac{-1}{9} = -0.000111000111\overline{000111}\ldots_2.$$
Hexadecimal is slightly better, because more information is represented in a single character.
If you want to understand the exact information representing your floating point number, you can use the .hex()
method. For example:
x = -1/9
x.hex()
Remark. The .hex()
method isn't giving the base 16 floating point representation. Rather, it is giving the base $2$ representation and using hexadecimal to encode the mantissa. In particular, the exponent still represents a power of $2$.
In the hex
expression for $-1/9$:
-0x1.c71c71c71c71cp-4
the -0
in the front refers to the negative sign, and the exponent is -4
. The mantissa in hex is c71c71c71c71c
. To convert this number recall that c
represents $12$. This means that
$$x = - \left(1+
\frac{12}{16}+\frac{7}{(16)^2}+\frac{1}{(16)^3}+
\frac{12}{(16)^4}+\frac{7}{(16)^5}+\frac{1}{(16)^6}+
\frac{12}{(16)^7}+\frac{7}{(16)^8}+\frac{1}{(16)^9}+
\frac{12}{(16)^{10}}+\frac{7}{(16)^{11}}+\frac{1}{(16)^{12}}+
\frac{12}{(16)^{13}}
\right)2^{-4}.$$
Simplifying by pulling out a factor of $16^{-13}$, we see that
$$x = - \left(16^{13}+
12 \cdot 16^{12} + 7 \cdot 16^{11} + 1 \cdot 2^{10} +
12 \cdot 16^9 + 7 \cdot 16^8 + 1 \cdot 2^7 +
12 \cdot 16^6 + 7 \cdot 16^5 + 1 \cdot 2^4 +
12 \cdot 16^3 + 7 \cdot 16^2 + 1 \cdot 2^1 +
12\right)2^{-4} \cdot 16^{-13}.$$
Thus, we can get back our exact number as follows:
sign = -1
numerator = 16**13 + \
12 * 16**12 + 7 * 16**11 + 1 * 16**10 + \
12 * 16**9 + 7 * 16**8 + 1 * 16**7 + \
12 * 16**6 + 7 * 16**5 + 1 * 16**4 + \
12 * 16**3 + 7 * 16**2 + 1 * 16**1 + \
12
denominator = 2**4 * 16**13
computed_x = sign * numerator / denominator
if x == computed_x:
print("They are equal.")
else:
print("They are not equal.")
Remark: The \
at the end of a line indicates that Python should continue reading an expression on the next line. This allows us to make long expressions more readable.
A simpler way to get a floating point number as an exact fraction is to use the .as_integer_ratio()
method.
x.as_integer_ratio()
This information agrees with what we computed for the numerator and denominator after canceling a common factor of four:
print(sign*numerator//4)
print(denominator//4)
You can also convert from a hex representation back into a float. E.g.,
y = float.fromhex('-0x1.aaap+0')
y
float_info
package¶You can access some information about floating point numbers on your Python 3 instance using the sys.float_info
package.
import sys
The largest number that can be stored as a float is sys.float_info.max
.
print(f"The largest number in decimal is {sys.float_info.max:e}.")
print(f"The largest number in hex is {sys.float_info.max.hex()}.")
Remark: The :e
in the first print
statement above tells Python to print a number using exponential notation. See the Format Specification for more details. (I suggest concentrating first on the examples on that page.)
The smallest positive number that can be stored at full precision is given by sys.float_info.min
print("The smallest positive number in decimal is " + \
f"{sys.float_info.min:e}.")
print(f"It is {sys.float_info.min.hex()} in hex.")
This shows that the range of exponents is from $-1022$ to $+1023$. The reason I said "at full precision" above is that you can obtain smaller positive numbers (on my computer at least) at a loss of precision:
min_div_2 = sys.float_info.min / 2
print(f"Dividing by two, we get {min_div_2:e}.")
print(f"And in hex we get {min_div_2.hex()}.")
You can see the loss of precision because the mantissa begins with 0.
rather than 1.
.
The machine unit, $\epsilon$, is the smallest number that you can add to $1$ and still store (exactly) as a float. It can be obtained as sys.float_info.epsilon
.
epsilon = sys.float_info.epsilon
print(f"The machine unit is {epsilon:e} in decimal " + \
f"and {epsilon.hex()} in hex.")
You can see why this is the smallest number by considering the hex representaion of $1+\epsilon$.
print(f"One plus the machine unit is {(1+epsilon).hex()} in hex.")
Python has some special float values. Thise can be accessed using the float
function which is built into python. (You can find out about it by typing float?
.)
First surprisingly, it has two different values of zero:
pos_zero = float("+0")
pos_zero
neg_zero = float("-0")
neg_zero
While they print differently, they are still ``equal''.
pos_zero == neg_zero
You can get negative zero by dividing negative number by a huge positive number. For example:
-1 / 2**5000
There are also two versions of infinity.
pos_infinity = float("+Infinity")
pos_infinity
neg_infinity = float("-Infinity")
neg_infinity
But, these are not equal:
pos_infinity == neg_infinity
There is also the value "not a number".
not_a_number = float("NaN")
not_a_number
This notion corresponds to an "indeterminant form" in math. For example:
pos_infinity + neg_infinity
pos_zero * pos_infinity
An OverflowError
should arise if a number exceeds the storable range. For example:
2.0**1300
A ZeroDivisionError
should arise when dividing by zero:
1/0
A try
and except
statement can be used to handle errors, which otherwise stop your program from running. For example:
try:
x = 1/0
except ZeroDivisionError as error:
print(f"Well that failed with an error: {error}")
For a more ellaborate example, we will start with $x = 2$ and keep squaring it until we see an overflow error. We will see how big $x$ is when it overflows.
x = 2.0
try:
while True: # run until an error arises.
x = x**2
except OverflowError:
pass # pass means do nothing
print(f"x is {x:e} in decimal and {x.hex()} in hex.")
Let me explain what happens in the above code. The line x = x**2
tells the computer to first square x
. Then it is supposed to store the result back in the x
variable. At some point (when $x=2^{512}$) the value of x
is so large that computing it's square causes an overflow error. This stops the evaluation of x = x**2
immediately, so the value of x
is not updated.
So, the value of x
printed above is the first value for which computing x**2
causes an overflow error.
x**2
We will briefly describe how to print floats in various formats. Please read LL § 1.6.2 for more detail. You can also learn more in the Python documentation.
from math import pi
Print 3 places after the decimal:
f"{pi:.3f}"
Same, but pad with space so that it takes up 6 spaces:
f"{pi:6.3f}"
Print the 5 most significant digits, in scientific notation if appropriate:
f"{pi:.5g}"
Pad to $7$ characters.
f"{pi:7.5g}"
The e
format specifier indicates that it should always be printed in scientific notation. The following prints with $5$ digits after the decimal in scientific notation.
f"{pi:.5e}"
Note that f
will not convert to scientific, while g
will if necessary:
f"{pi*10**10:.3f} = {pi*10**10:.3g}"
Numpy has a higher precision float type numpy.float128
. There are also some lower precision float types. A list is availble in the NumPy documentation.
import numpy as np
You can convert to this type by using np.float64
as a function. E.g.,
x = np.float128(7)
type(x)
You should be careful when converting to it. For example, this is the wrong way to compute $1/3$ as a float128
since it first computes $1/3$ as a float and then converts it to a float128
.
np.float128(1/3)
This would be better:
np.float128(1)/3
You can learn more about float128
in the NumPy documentation.
There is also the mpmath package for arbitrary precision (meaning it gives as much bits or digits of precision as you request). I describe some basics of using the library below. You may also want to see the basic usage section of the mpmath documentation.
You can import the arbitrary precision library with:
from mpmath import mp
You can set the number of bits of precision as follows:
mp.prec = 4
Then you can represent $4/3$ like this:
x = mp.mpf(4)/3
x
Here mp
stands for multiple precision, and the mpf
for multiple precision float.
We can see that this is representing $1 \frac{3}{8}$, which is the closest number to $4/3$ that can be represented with $4$ bits. Note that both $4$ and $3$ can be stored exactly, and the $3$ will be converted to a number with the same precision before dividing. At high precision, this is better than writing mp.mpf(4/3)
since 4/3
will be computed as a float which is then converted to mpf
, so this expression will not be any more accurate than a float.
The way numbers are stored is a bit different. You can access the details like this:
x.man_exp
This tells you that the number stored is $$11 \times 2^{-3} = \frac{11}{8} = 1 \frac{3}{8}.$$ Here $11$ is the mantissa and $-3$ is the exponent. The exponent can store any integer (like Python 3's integer type) while the size of the binary representation of the mantissa is limited by our choice of precision. Here $11$ can be written as $1011$ in binary. This is the four bits we need. Note however that the bit in the ones place should always be one (unless zero is stored) since otherwise the number is even and we can divide by $2$ and incorporate the factor of $2$ into the exponent. On the other hand, we also need a bit for stroring the sign.
A great thing about this is that even at $4$ bits of precision, there is no possibility of over or underflow.
y = mp.mpf(2)**(10**100)
y
y.man_exp
You could try to compute that with a float by typing 2.0**(10**100)
. You should get an OverflowError
.
You can set the number of digits of precision instead of bits. This will change the number of bits of precision to match.
mp.dps = 50
Then you can access the first 50 digits of pi:
mp.pi()
There are also the usual functions for working with these numbers:
x = mp.sin(mp.pi()/3)
x
The above quantity should be $\sqrt{\frac{3}{4}}$ and we can check that with:
x**2