PandasZoo

Pandas Tutorials

Pandas Tutorial 4 - Wine Quality.

Pandas Basics: Filtering and plotting

The fourth tutorial is reviewing and learning functions useful for being an analyst that not have already been covered. The questions are based on a wine quality data set.

Note: At PandasZoo we use single quotes for our answers.

This is a sample of what the data set looks like:

	fixed acidity	volatile acidity	citric acid	residual sugar	chlorides	free sulfur dioxide	total sulfur dioxide	density	pH	sulphates	alcohol	quality
0	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5
1	7.8	0.88	0.00	2.6	0.098	25.0	67.0	0.9968	3.20	0.68	9.8	5
2	7.8	0.76	0.04	2.3	0.092	15.0	54.0	0.9970	3.26	0.65	9.8	5
3	11.2	0.28	0.56	1.9	0.075	17.0	60.0	0.9980	3.16	0.58	9.8	6
4	7.4	0.70	0.00	1.9	0.076	11.0	34.0	0.9978	3.51	0.56	9.4	5

Question 1

Import the Pandas module.

Hint: We put in an example answer that you should try typing in.

Question 2

Read in the wine quality data set. No need for a full path. The file is called winequality-red.csv

When you read in the data and make a dataframe object, call it wine.

Hint: We refer to Pandas as pd. You can find official documentation for read_csv here

Question 3

Use the head function to look at the wine DataFrame.

Question 4

Sort the wine DataFrame by alcohol where ascending is false. Overwrite the wine DataFrame with this sorting.

Question 5

Dang, there is wine out there that is 14% or higher, let's make a column for anything greater than or equal to 14% and call the column strong_wine.

Make the new column have a true/false, so no need for any if logic.

Hint, the data is already multipled by 100

Question 6

Let's take a look at wines that are True for the 'strong_wine' column and have a pH of 3.68. Use the head function to see these.

Question 7

Let's use regex filtering to look at only the pH column.

Use the head function so we don't need to save an object.

Question 8

Let's use seaborn to plot quality on the x axis and alcohol on the y axis. Make a linear fit using fit_reg=True.

lmplot is the function we want to use.

Hint: Assume I ran the follwing code already:

import seaborn as sns

PandasZoo.