Python Data Visualisation using Seaborn

Seaborn is a Python data visualisation library based on matplotlib – a Python 2D plotting library. It provides a high-level interface for drawing attractive and informative statistical graphics. Another key feature about Seaborn is that it is closely integrated with pandas data structures.

Seaborn was developed with a number of objectives that we will list below:

  • A dataset-oriented API for examining relationships between multiple variables
  • Specialised support for using categorical variables to show observations or aggregate statistics
  • Options for visualising univariate or bivariate distributions and for comparing them between subsets of data
  • Automatic estimation and plotting of linear regression models for different kinds dependent variables
  • Convenient views onto the overall structure of complex datasets
  • High-level abstractions for structuring multi-plot grids that let you easily build complex visualisations
  • Concise control over matplotlib figure styling with several built-in themes
  • Tools for choosing color palettes that faithfully reveal patterns in your data

In this post, we will explore different datasets to demonstrate Seaborn’s powerful graphical capabilities.

Installing and getting started

You can install the latest version of Seaborn using pip (pip install seaborn) or conda (conda install seaborn).

We will start by exploring the Iris dataset.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

We imported seaborn, which is the library we will be using to produce the plots. It’s important to note that seaborn uses matplotlib behind the scenes to draw plots. A lot can be accomplished with only seaborn functions, but further customisation will require the use of matplotlib directly.

We also applied the default seaborn theme, scaling and colour palette using sns.set(). This will affect how all matplotlib plots look, even if they were not made with seaborn.

iris = sns.load_dataset('iris')
iris.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa
print(iris.shape)
(150, 5)

The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.

Visualising dataset structure

To get a broad view of the data we will use seaborn’s pairplot function, which shows all pairwise relationships and the marginal distributions.

sns.pairplot(data=iris, hue='species')
Figure 1: Pairplot

Seaborn also allows us to fit linear regression models to the scatter plots.

sns.pairplot(iris, kind='reg')
Figure 2: Pairplot with regression

It is also possible to show a subset of variables or plot different variables on the rows and columns. Assuming we just wanted to visualise the sepal length and sepal width:

sns.pairplot(iris, vars=['sepal_width','sepal_length'], hue='species', height=3)
Figure 3: Pairplot with select variables

Seaborn also gives you control over the variables in the rows and columns:

sns.pairplot(iris, x_vars=['sepal_width', 'sepal_length'], y_vars=['petal_width', 'petal_length'], hue='species', height=3)
Figure 4: Pairplot with custom rows and columns

Visualising statistical relationships

Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables.Visualisation can be a core component of this process because, when data is visualised properly, the human visual system can see trends and patterns that indicate a relationship.

Probably the best-known representation of the relationship between two variables is the scatterplot. To demonstrate this we will take a look at some data that shows the amount that restaurant staff recieve in tips on various indicator data:

tips = sns.load_dataset('tips')
tips.head()
total_bill tip sex smoker day time size
0 16.99 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
print(tips.shape)
(244, 7)
sns.scatterplot(data=tips, x='total_bill', y='tip')
Figure 5: Scatter plot of tip amount and total bill

Leave a Comment

Your email address will not be published. Required fields are marked *