Pandas Tutorials

Pandas Tutorial 2 - UCI Cycling data set.

Pandas Basics: Multiple data sets exercises

The second tutorial is using two UCI cycing data sets. They are two CSVs.

The questions we use for this tutorial are based on two different years of UCI Men's Cyclo-Cross dataset, which can be found here and clicking individual rankings.

Note: At PandasZoo we use single quotes for our answers.

This is a sample of what the data set looks like:

Rank UCI ID Name Nationality Team Code Age Points year type
0 1 10007946203 VAN DER POEL Mathieu NETHERLANDS CCU 24 2600 2019 cyclocross
1 2 10007585986 VAN AERT Wout BELGIUM NaN 25 2452 2019 cyclocross
2 3 10007586087 AERTS Toon BELGIUM TFL 26 2415 2019 cyclocross
3 4 10007155651 VANTHOURENHOUT Michael BELGIUM MNG 26 1610 2019 cyclocross
4 5 10006118660 VAN DER HAAR Lars NETHERLANDS TFL 28 1431 2019 cyclocross

Question 1

Import the Pandas module.

Hint: We put in an example answer that you should try typing in.

Question 2

Read in the uci_cyclocross_rankings_historicals.csv data set.

When you read in the dataframe, call it cyclo. No need to include a path to the file.

Hint: We refer to Pandas as pd. You can find official documentation for read_csv here

Question 3

Use the head function to look at the cyclo DataFrame.

Question 4

Create a dataframe called cyclo_2018 which subsets cyclo for only rows where the 'year' column is 2018.

Question 5

Similar to question 4, create a dataframe called cyclo_2019 which subsets cyclo for only rows where the 'year' column is 2019.

Question 6

Use a left merge to merge cyclo_2018 with cyclo_2019. In other words, merge cyclo_2018 to cyclo_2019 (left side) and call the dataframe cyclo_both

Use 'UCI ID' as the variable to merge on.

Hint: we refer to Pandas as pd and put 'how' before 'on' in the merge function.

Question 7

Let's append cyclo_2018 with cyclo_2019 into a dataframe called cyclo_append

Hint: Append cyclo_2019 to cyclo_2018 dataframe

Question 8

Let's use the dataframe cyclo_both from eariler to only view riders whom have Points_x greater than Points_y using the head function.

In other words lets preview the data using the head function to see what riders earned more points in 2019 versus 2018.