Exploring Data Visualisation using Matplotlib and Seaborn

In this blog we will be exploring visualisation of data using matplotlib and seaborn.

Before we start let us discuss about Matplotlib and Seaborn.

Matplotlib was introduced by John Hunter in 2002. It is the main visualisation library in Python, all other libraries are built on top of matplotlib.


The library itself is huge, with approximately 70,000 total lines of code and is still developing. Typically it is used together with the numerical mathematics extension: NumPy. It contains an interface "pyplot" which is designed to to resemble that of MATLAB.


We can plot anything with matplotlib but plotting non-basic can be very complex to implement. Thus, it is advised to use some other higher-level tools when creating complex graphics.


Coming to Seaborn: It is a library for creating statistical graphics in Python. It is built on top of matplotlib and integrates closely with pandas data structures. It is considered as a superset of the Matplotlib library and thus is inherently better than matplotlib. Its plots are naturally prettier and easy to customise with colour palettes.


The aim of Seaborn is to provide high-level commands to create a variety of plot types that are useful for statistical data exploration, and even some statistical model fitting. It has many built-in complex plots.


First we will see how we can plot the same graphs using Matplotlib and Seaborn. This would help us to make a comparison between the two.

We will use datasets available in the Seaborn library to plot the graphs.


some useful links:



Scatterplot


For this kind of plot we will use the Penguin dataset which is already available in seaborn. The dataset contains details about three species of penguins namely, Adelie, Chinstrap and Gentoo.


Matplotlib code:

plt.figure(figsize=(14,7))
plt.scatter('bill_length_mm', 'bill_depth_mm', data=df,c='species',cmap='Set2')
plt.xlabel('Bill length', fontsize='large')
plt.ylabel('Bill depth', fontsize='large');

We have plotted the bill length against the bill depth. Bill refers to the beak of penguins. They are of various shapes and sizes and vary from species to species. Clearly in the above graph we can't make out which data belongs to which species. This is due to Matplotlib being unable to produce a legend when a plot is made in this manner.

Let us now plot the same graph along with the legend.


Matplotlib code:


plt.rcParams['figure.figsize'] = [15, 10]

fontdict={'fontsize': 18,
          'weight' : 'bold',
         'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fontdicty={'fontsize': 16,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

Adelie = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], marker='o', color='skyblue')
Chinstrap = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], marker='o', color='yellowgreen')
Gentoo = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], marker='o', color='darkgray')

plt.legend(handles=(Adelie,Chinstrap,Gentoo),
           labels=('Adelie','Chinstrap','Gentoo'),
           title="Species", title_fontsize=16,
           scatterpoints=1,
           bbox_to_anchor=(1, 0.7), loc=2, borderaxespad=1.,
           ncol=1,
           fontsize=14)
plt.title('Penguins', fontdict=fontdict, color="black")
plt.xlabel("Bill length (mm)", fontdict=fontdictx)
plt.ylabel("Bill depth (mm)", fontdict=fontdicty);

Let's discuss a few points in the above code:

  • plt.rcParams['figure.figsize'] = [15, 10] allows to control the size of the entire plot. This corresponds to a 15∗10 (length∗width) plot.

  • fontdict is a dictionary that can be passed in as arguments for labeling axes. fontdict for the title, fontdictx for the x-axis and fontdicty for the y-axis.

  • There are now 4 plt.scatter() function calls corresponding to one of the four seasons. This is seen again in the data argument in which it has been subsetted to correspond to a single season. marker and color arguments correspond to using a 'o' to visually represent a data point and the respective color of that marker.

We will now do the same thing using Seaborn.

Seaborn code:

plt.figure(figsize=(14,7))

fontdict={'fontsize': 18,
          'weight' : 'bold',
         'horizontalalignment': 'center'}

sns.set_context('talk', font_scale=0.9)
sns.set_style('ticks')

sns.scatterplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df, 
                style='species',palette="rocket", legend='full')

plt.legend(scatterpoints=1,bbox_to_anchor=(1, 0.7), loc=2, borderaxespad=1.,
           ncol=1,fontsize=14)

plt.xlabel('Bill Length (mm)', fontsize=16, fontweight='bold')
plt.ylabel('Bill Depth (mm)', fontsize=16, fontweight='bold')
plt.title('Penguins', fontdict=fontdict, color="black",
         position=(0.5,1));

A few points to discuss:

  • sns.set_style() must be one of : 'white', 'dark', 'whitegrid', 'darkgrid', 'ticks'. This controls the plot area. Such as the color, grid and presence of ticks.

  • sns.set_context() must be one of: 'paper', 'notebook', 'talk', 'poster'. This controls the layout of the plot in terms of how it is to be read. Such as if it was on a 'poster' where we will see enlarged images and text. 'Talk' will create a plot with a more bold font.


We can see that with Seaborn we needed less lines of code to produce a beautiful graph with legend.


We will now try our hand at making subplots to represent each species using a different graph in the same plot.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [15,10]
plt.rcParams["font.weight"] = "bold"

fontdict={'fontsize': 25,
          'weight' : 'bold'}

fontdicty={'fontsize': 18,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

plt.subplots_adjust(wspace=0.2, hspace=0.5)

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.01))

#subplot 1
ax1 = fig.add_subplot(221)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], c="skyblue")
ax1.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax1.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax1.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

ax2 = fig.add_subplot(222)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], c="yellowgreen")
ax2.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")
ax2.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));


ax3 = fig.add_subplot(223)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

Here we have created subplots representing each species. But the graphs don’t help us to make a comparison at first glance. That is because each graph has a varying x-axis. Let’s make it uniform.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [12,12]
plt.rcParams["font.weight"] = "bold"

plt.subplots_adjust(hspace=0.60)


fontdicty={'fontsize': 20,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 20,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.0))

#ax2 is defined first because the other plots are sharing its x-axis
ax2 = fig.add_subplot(412, sharex=ax2)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==2], c="skyblue")
ax2.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax2.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(-0.3,0.3))


ax1 = fig.add_subplot(411, sharex=ax2)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==1], c="yellowgreen")
ax1.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")


ax3 = fig.add_subplot(413, sharex=ax2)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx);

Let’s change the shape of the markers in the above graph to make it look more customised.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [15,10]
plt.rcParams["font.weight"] = "bold"

fontdict={'fontsize': 25,
          'weight' : 'bold'}

fontdicty={'fontsize': 18,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

plt.subplots_adjust(wspace=0.2, hspace=0.5)

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.01))

#subplot 1
ax1 = fig.add_subplot(221)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], c="skyblue",marker='x')
ax1.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax1.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax1.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

ax2 = fig.add_subplot(222)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], c="yellowgreen",marker='^')
ax2.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")
ax2.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));


ax3 = fig.add_subplot(223)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], c="darkgray",marker='*')
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

We will create the same plot using Seaborn as well.


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')

g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='sex', data=df,palette="rocket",
                legend='full',col='species', col_wrap=2, 
                height=4, aspect=1.6, sizes=(800,800))

g.fig.suptitle('Penguins',position=(0.5,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);


Notice that here the subplots representing the species are further divided into two classes i.e. Male and Female. Again we can notice how Seaborn stands out to be superior by producing a better graph with a few lines of code.

We can also add different markers for each species in the above graph. Let’s do that.


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')
g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df,palette="rocket",
                col='species', col_wrap=4, legend='full',
                height=6, aspect=0.5, style='species', sizes=(800,1000))

g.fig.suptitle('Penguins' ,position=(0.4,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);

In a similar fashion as shown above, we can make the subplots share the same y-axis instead of sharing the same x-axis. The following plots represent the same.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [12,12]
plt.rcParams["font.weight"] = "bold"

plt.subplots_adjust(hspace=0.60)


fontdicty={'fontsize': 20,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 20,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.0))

#ax2 is defined first because the other plots are sharing its x-axis
ax2 = fig.add_subplot(141, sharex=ax2)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==2], c="skyblue")
ax2.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax2.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(-0.3,0.5))


ax1 = fig.add_subplot(142, sharex=ax2)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==1], c="yellowgreen")
ax1.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")


ax3 = fig.add_subplot(143, sharex=ax2)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx,position=(-0.7,0));


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')
g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df,palette="rocket",
                col='species', col_wrap=4, legend='full',
                height=6, aspect=0.5, style='species', sizes=(800,1000))

g.fig.suptitle('Penguins' ,position=(0.4,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);


So, this is how you can create subplots. It can be done with any other kind of graphs as well such as line graphs, histograms etc. Let us try our hand at different kinds of graphs for visualization.


Line plot


For plotting this kind of graph we will create some random data using numpy and random libraries.


Code for creating data:

import numpy as np
from random import *

rng = np.random.RandomState(0)
x = np.linspace(0, 10, 8)
y = np.cumsum(rng.randn(8, 8), 0)

Matplotlib code: