Pandas Tutorials

Pandas Tutorial 4 - Wine Quality.

Pandas Basics: Filtering and plotting

The fourth tutorial is reviewing and learning functions useful for being an analyst that not have already been covered. The questions are based on a wine quality data set.

Note: At PandasZoo we use single quotes for our answers.

This is a sample of what the data set looks like:

fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5
1 7.8 0.88 0.00 2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5
2 7.8 0.76 0.04 2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5
3 11.2 0.28 0.56 1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6
4 7.4 0.70 0.00 1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

Question 1

Import the Pandas module.

Hint: We put in an example answer that you should try typing in.

Question 2

Read in the wine quality data set. No need for a full path. The file is called winequality-red.csv

When you read in the data and make a dataframe object, call it wine.

Hint: We refer to Pandas as pd. You can find official documentation for read_csv here

Question 3

Use the head function to look at the wine DataFrame.

Question 4

Sort the wine DataFrame by alcohol where ascending is false. Overwrite the wine DataFrame with this sorting.

Question 5

Dang, there is wine out there that is 14% or higher, let's make a column for anything greater than or equal to 14% and call the column strong_wine.

Make the new column have a true/false, so no need for any if logic.

Hint, the data is already multipled by 100

Question 6

Let's take a look at wines that are True for the 'strong_wine' column and have a pH of 3.68. Use the head function to see these.

Question 7

Let's use regex filtering to look at only the pH column.

Use the head function so we don't need to save an object.

Question 8

Let's use seaborn to plot quality on the x axis and alcohol on the y axis. Make a linear fit using fit_reg=True.

lmplot is the function we want to use.

Hint: Assume I ran the follwing code already:

import seaborn as sns