In this notebook, we will look at importing data from CSV files. A good reference is the Python documentation of the CSV library.
We'll just confine ourself here to importing and some basic plotting of data.
To import Python's the csv libary module, we'll use:
import csv
Apparently was a 2018 squirrel census in Central Park. They even have data you can download from the 2018 census.
I've downloaded the file 2018_Central_Park_SquirrelCensus-_Squirrel_Data.csv and placed it in the same folder as this notebook.
It might be useful to open the CSV file in a spreadsheet program before analyzing it, just to get comfortable with the format. When you do this, you'll notice there is a header row followed by rows of data, one for each squirrel in the census.
Here we store off the filename:
filename = '2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv'
To open this file for reading we use the following (where 'r'
indicates that the file is being opened for reading):
f = open(filename, 'r')
Since this is a CSV file, we can then send the data from f
to a CSV reader from the csv
library:
reader = csv.reader(f)
To read the first row you use:
header_row = next(reader)
Here I display the header_row
.
header_row
This is the first row in the data, the header information. It will be useful to know the index of each column. Here is a more useful presentation of the information in the header row.
for i in range(len(header_row)):
print(f'{i}: {header_row[i]}')
Let's read the next row.
row = next(reader)
row
You can see that the entries in this row are all strings. It might be more useful to print this data with the header.
for i in range(len(header_row)):
print(f'{i}: {header_row[i]}: {row[i]}')
You can see from this that the squirrel associated to this row (Squirrel 37F-PM-1014-03
) is nor running, chasing, climbing, eating, or foraging. He doesor she doesn't seem to be doing anything at all...
We can also see the first two entries in a row correspond to the latitude and longitude of the squirrel. (Compare the first two rows with the last.)
Once we are done reading from a file we should close it. We do this with:
f.close()
Maybe we are interested in counting the number of each color. From the above, the squirrel's primary color is in position 8
of each row. We'll keep track of the number of squirrels of each color using a dictionary, mapping colors to the number of squirrels.
We'll loop through the rows, skipping the first, and update the dictionary as needed.
Initialize an empty dictionary:
squirrel_color_dictionary = {}
Open the CSV file for reading.
filename = '2018_Central_Park_Squirrel_Census_-_Squirrel_Data.csv'
f = open(filename, 'r')
reader = csv.reader(f)
Store off the header row:
header_row = next(reader)
To iterate through the remaining rows, we'll use a for loop:
for row in reader:
# Do something
In the loop we can access the data for each row using the row
variable. We'll want to access the color with row[8]
. Then we either add that color to the dictionary (representing the first squirrel of that color) or increase the number that dictionary points to by one.
for row in reader:
color = row[8]
if color in squirrel_color_dictionary:
# Increase the number of squirrels of this color by one.
squirrel_color_dictionary[color] += 1
else:
# This is our first squirel of this color.
squirrel_color_dictionary[color] = 1
Now let's look at the results.
squirrel_color_dictionary
I'm guessing ''
represents an unidentified color. Here we print the percentages of each.
total = 0
for color in squirrel_color_dictionary:
total += squirrel_color_dictionary[color]
print(f'There are a total of {total} squirrels in the census.')
for color in squirrel_color_dictionary:
if color == '':
fixed_color = "Unidentified color"
else:
fixed_color = color
percentage = 100*squirrel_color_dictionary[color]/total
print(f'{fixed_color} squirrels make up {percentage:.2f}% of the census.')
f.close()
If we use numbers, we'll need to convert strings to numbers with float(row[i])
or int(row[i])
.
Here we'll demonstrate how to plot squirrel positions.
f = open(filename, 'r')
reader = csv.reader(f)
header = next(reader)
import matplotlib.pyplot as plt
Below we load the latitude and longitude positions of squirrels into lists.
xs = []
ys = []
for row in reader:
x = float(row[0])
xs.append(x)
y = float(row[1])
ys.append(y)
Now we plot this data.
plt.axes().set_aspect('equal')
plt.plot(xs, ys, '.')
plt.show()