Pandas Tutorials

Pandas Tutorial 3 - Daily Births.

Pandas Basics: Time Series exercises

The third tutorial is time series focused. The questions we use for this tutorial are based on a time series dataset depicting the total number of female births recording in California, USA during the year of 1959. This is a basic time series dataset, with only the date ("dd/mm/yyyy" format), and the number of births. There are 365 records in total.

The data set can be found here.

Note: At PandasZoo we use single quotes for our answers.

This is a sample of what the data set looks like:

Date Births
0 1959-01-01 35
1 1959-01-02 32
2 1959-01-03 30
3 1959-01-04 31
4 1959-01-05 44


Question 1

Import the Pandas module.

Hint: We put in an example answer that you should try typing in.





Question 2

Read in the daily births data set.

Assume I already ran the following code:

url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv'

When you read in the dataframe, call it df. Use the url object that was already created.

Hint: We refer to Pandas as pd. You can find official documentation for read_csv here

Hint #2: Make sure to include the following in your read_csv function:

error_bad_lines=False





Question 3

Use the head function to look at the df DataFrame.





Question 4

Plot the 'Births' column from the df DataFrame.





Question 5

See what the index looks like for df





Question 6

Plot the range 0:30 for 'Births' in the df DataFrame. In other words, plot the first 31 entries for the 'Birth' column.





Question 7

Let's plot the rolling mean of 'Births' using the rolling function.

Hint: Use a minimum of 10 periods, a window of 60 and set center to false.





Question 8

Let's plot the rolling standard deviation of 'Births' using the rolling function.

Hint: Use a minimum of 10 periods, a window of 60 and set center to false.





Question 9

Let's create a column called 'birthday' using an IF statement. When the date July 2nd 1959, make the 'birthday' column 'TRUE'.

Hint: A good website on IF statements using Pandas can be found here

Hint: This requires two lines, make the first one the 'TRUE' part of the statement.





Question 10

Let's complete the 'birthday' column using an IF statement. When the date is not July 2nd 1959, make the 'birthday' column 'FALSE'.

Hint: A good website on IF statements using Pandas can be found here

Hint: This requires two lines, make the second one is the 'FALSE' part of the statement.





Question 11

Let's randomly sample 50 rows from df and call it dfb.





Question 12

Let's pull the 10 days with the largest number of births from df.





Question 13

Use the pivot function to have a column for every date. Let 'birthday' be the index and 'Births' be the value.