Pandas Tutorials

## Pandas Tutorial 3 - Daily Births.

### Pandas Basics: Time Series exercises

The third tutorial is time series focused. The questions we use for this tutorial are based on a time series dataset depicting the total number of female births recording in California, USA during the year of 1959. This is a basic time series dataset, with only the date ("dd/mm/yyyy" format), and the number of births. There are 365 records in total.

The data set can be found here.

Note: At PandasZoo we use single quotes for our answers.

This is a sample of what the data set looks like:

Date Births
0 1959-01-01 35
1 1959-01-02 32
2 1959-01-03 30
3 1959-01-04 31
4 1959-01-05 44

## Question 1

Import the Pandas module.

Hint: We put in an example answer that you should try typing in.

## Question 2

Read in the daily births data set.

Assume I already ran the following code:

` url = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv' `

When you read in the dataframe, call it df. Use the url object that was already created.

Hint: We refer to Pandas as pd. You can find official documentation for read_csv here

Hint #2: Make sure to include the following in your read_csv function:

` error_bad_lines=False `

## Question 3

Use the head function to look at the df DataFrame.

## Question 4

Plot the 'Births' column from the df DataFrame.

## Question 5

See what the index looks like for df

## Question 6

Plot the range 0:30 for 'Births' in the df DataFrame. In other words, plot the first 31 entries for the 'Birth' column.

## Question 7

Let's plot the rolling mean of 'Births' using the rolling function.

Hint: Use a minimum of 10 periods, a window of 60 and set center to false.

## Question 8

Let's plot the rolling standard deviation of 'Births' using the rolling function.

Hint: Use a minimum of 10 periods, a window of 60 and set center to false.

## Question 9

Let's create a column called 'birthday' using an IF statement. When the date July 2nd 1959, make the 'birthday' column 'TRUE'.

Hint: A good website on IF statements using Pandas can be found here

Hint: This requires two lines, make the first one the 'TRUE' part of the statement.

## Question 10

Let's complete the 'birthday' column using an IF statement. When the date is not July 2nd 1959, make the 'birthday' column 'FALSE'.

Hint: A good website on IF statements using Pandas can be found here

Hint: This requires two lines, make the second one is the 'FALSE' part of the statement.

## Question 11

Let's randomly sample 50 rows from df and call it dfb.

## Question 12

Let's pull the 10 days with the largest number of births from df.

## Question 13

Use the pivot function to have a column for every date. Let 'birthday' be the index and 'Births' be the value.