Seaborn is a Python data visualisation library based on matplotlib – a Python 2D plotting library. It provides a high-level interface for drawing attractive and informative statistical graphics. Another key feature about Seaborn is that it is closely integrated with pandas data structures.
Seaborn was developed with a number of objectives that we will list below:
- A dataset-oriented API for examining relationships between multiple variables
- Specialised support for using categorical variables to show observations or aggregate statistics
- Options for visualising univariate or bivariate distributions and for comparing them between subsets of data
- Automatic estimation and plotting of linear regression models for different kinds dependent variables
- Convenient views onto the overall structure of complex datasets
- High-level abstractions for structuring multi-plot grids that let you easily build complex visualisations
- Concise control over matplotlib figure styling with several built-in themes
- Tools for choosing color palettes that faithfully reveal patterns in your data
In this post, we will explore different datasets to demonstrate Seaborn’s powerful graphical capabilities.
Installing and getting started
You can install the latest version of Seaborn using pip (pip install seaborn) or conda (conda install seaborn).
We will start by exploring the Iris dataset.
import matplotlib.pyplot as plt import seaborn as sns sns.set()
We imported seaborn, which is the library we will be using to produce the plots. It’s important to note that seaborn uses matplotlib behind the scenes to draw plots. A lot can be accomplished with only seaborn functions, but further customisation will require the use of matplotlib directly.
We also applied the default seaborn theme, scaling and colour palette using sns.set(). This will affect how all matplotlib plots look, even if they were not made with seaborn.
iris = sns.load_dataset('iris') iris.head()
The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant (Iris Setosa, Iris virginica, and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters.
Visualising dataset structure
To get a broad view of the data we will use seaborn’s pairplot function, which shows all pairwise relationships and the marginal distributions.
Seaborn also allows us to fit linear regression models to the scatter plots.
It is also possible to show a subset of variables or plot different variables on the rows and columns. Assuming we just wanted to visualise the sepal length and sepal width:
sns.pairplot(iris, vars=['sepal_width','sepal_length'], hue='species', height=3)
Seaborn also gives you control over the variables in the rows and columns:
sns.pairplot(iris, x_vars=['sepal_width', 'sepal_length'], y_vars=['petal_width', 'petal_length'], hue='species', height=3)
Visualising statistical relationships
Statistical analysis is a process of understanding how variables in a dataset relate to each other and how those relationships depend on other variables.Visualisation can be a core component of this process because, when data is visualised properly, the human visual system can see trends and patterns that indicate a relationship.
Probably the best-known representation of the relationship between two variables is the scatterplot. To demonstrate this we will take a look at some data that shows the amount that restaurant staff recieve in tips on various indicator data:
tips = sns.load_dataset('tips') tips.head()
sns.scatterplot(data=tips, x='total_bill', y='tip')
We can also group by time and see the relationship between the total bill and the tip amount at different times(Lunch and Dinner).