Exploring Data Visualisation using Matplotlib and Seaborn

In this blog we will be exploring visualisation of data using matplotlib and seaborn.

Before we start let us discuss about Matplotlib and Seaborn.

Matplotlib was introduced by John Hunter in 2002. It is the main visualisation library in Python, all other libraries are built on top of matplotlib.


The library itself is huge, with approximately 70,000 total lines of code and is still developing. Typically it is used together with the numerical mathematics extension: NumPy. It contains an interface "pyplot" which is designed to to resemble that of MATLAB.


We can plot anything with matplotlib but plotting non-basic can be very complex to implement. Thus, it is advised to use some other higher-level tools when creating complex graphics.


Coming to Seaborn: It is a library for creating statistical graphics in Python. It is built on top of matplotlib and integrates closely with pandas data structures. It is considered as a superset of the Matplotlib library and thus is inherently better than matplotlib. Its plots are naturally prettier and easy to customise with colour palettes.


The aim of Seaborn is to provide high-level commands to create a variety of plot types that are useful for statistical data exploration, and even some statistical model fitting. It has many built-in complex plots.


First we will see how we can plot the same graphs using Matplotlib and Seaborn. This would help us to make a comparison between the two.

We will use datasets available in the Seaborn library to plot the graphs.


some useful links:



Scatterplot


For this kind of plot we will use the Penguin dataset which is already available in seaborn. The dataset contains details about three species of penguins namely, Adelie, Chinstrap and Gentoo.


Matplotlib code:

plt.figure(figsize=(14,7))
plt.scatter('bill_length_mm', 'bill_depth_mm', data=df,c='species',cmap='Set2')
plt.xlabel('Bill length', fontsize='large')
plt.ylabel('Bill depth', fontsize='large');

We have plotted the bill length against the bill depth. Bill refers to the beak of penguins. They are of various shapes and sizes and vary from species to species. Clearly in the above graph we can't make out which data belongs to which species. This is due to Matplotlib being unable to produce a legend when a plot is made in this manner.

Let us now plot the same graph along with the legend.


Matplotlib code:


plt.rcParams['figure.figsize'] = [15, 10]

fontdict={'fontsize': 18,
          'weight' : 'bold',
         'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fontdicty={'fontsize': 16,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

Adelie = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], marker='o', color='skyblue')
Chinstrap = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], marker='o', color='yellowgreen')
Gentoo = plt.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], marker='o', color='darkgray')

plt.legend(handles=(Adelie,Chinstrap,Gentoo),
           labels=('Adelie','Chinstrap','Gentoo'),
           title="Species", title_fontsize=16,
           scatterpoints=1,
           bbox_to_anchor=(1, 0.7), loc=2, borderaxespad=1.,
           ncol=1,
           fontsize=14)
plt.title('Penguins', fontdict=fontdict, color="black")
plt.xlabel("Bill length (mm)", fontdict=fontdictx)
plt.ylabel("Bill depth (mm)", fontdict=fontdicty);

Let's discuss a few points in the above code:

  • plt.rcParams['figure.figsize'] = [15, 10] allows to control the size of the entire plot. This corresponds to a 15∗10 (length∗width) plot.

  • fontdict is a dictionary that can be passed in as arguments for labeling axes. fontdict for the title, fontdictx for the x-axis and fontdicty for the y-axis.

  • There are now 4 plt.scatter() function calls corresponding to one of the four seasons. This is seen again in the data argument in which it has been subsetted to correspond to a single season. marker and color arguments correspond to using a 'o' to visually represent a data point and the respective color of that marker.

We will now do the same thing using Seaborn.

Seaborn code:

plt.figure(figsize=(14,7))

fontdict={'fontsize': 18,
          'weight' : 'bold',
         'horizontalalignment': 'center'}

sns.set_context('talk', font_scale=0.9)
sns.set_style('ticks')

sns.scatterplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df, 
                style='species',palette="rocket", legend='full')

plt.legend(scatterpoints=1,bbox_to_anchor=(1, 0.7), loc=2, borderaxespad=1.,
           ncol=1,fontsize=14)

plt.xlabel('Bill Length (mm)', fontsize=16, fontweight='bold')
plt.ylabel('Bill Depth (mm)', fontsize=16, fontweight='bold')
plt.title('Penguins', fontdict=fontdict, color="black",
         position=(0.5,1));

A few points to discuss:

  • sns.set_style() must be one of : 'white', 'dark', 'whitegrid', 'darkgrid', 'ticks'. This controls the plot area. Such as the color, grid and presence of ticks.

  • sns.set_context() must be one of: 'paper', 'notebook', 'talk', 'poster'. This controls the layout of the plot in terms of how it is to be read. Such as if it was on a 'poster' where we will see enlarged images and text. 'Talk' will create a plot with a more bold font.


We can see that with Seaborn we needed less lines of code to produce a beautiful graph with legend.


We will now try our hand at making subplots to represent each species using a different graph in the same plot.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [15,10]
plt.rcParams["font.weight"] = "bold"

fontdict={'fontsize': 25,
          'weight' : 'bold'}

fontdicty={'fontsize': 18,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

plt.subplots_adjust(wspace=0.2, hspace=0.5)

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.01))

#subplot 1
ax1 = fig.add_subplot(221)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], c="skyblue")
ax1.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax1.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax1.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

ax2 = fig.add_subplot(222)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], c="yellowgreen")
ax2.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")
ax2.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));


ax3 = fig.add_subplot(223)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

Here we have created subplots representing each species. But the graphs don’t help us to make a comparison at first glance. That is because each graph has a varying x-axis. Let’s make it uniform.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [12,12]
plt.rcParams["font.weight"] = "bold"

plt.subplots_adjust(hspace=0.60)


fontdicty={'fontsize': 20,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 20,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.0))

#ax2 is defined first because the other plots are sharing its x-axis
ax2 = fig.add_subplot(412, sharex=ax2)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==2], c="skyblue")
ax2.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax2.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(-0.3,0.3))


ax1 = fig.add_subplot(411, sharex=ax2)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==1], c="yellowgreen")
ax1.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")


ax3 = fig.add_subplot(413, sharex=ax2)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx);

Let’s change the shape of the markers in the above graph to make it look more customised.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [15,10]
plt.rcParams["font.weight"] = "bold"

fontdict={'fontsize': 25,
          'weight' : 'bold'}

fontdicty={'fontsize': 18,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 18,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

plt.subplots_adjust(wspace=0.2, hspace=0.5)

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.01))

#subplot 1
ax1 = fig.add_subplot(221)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==1], c="skyblue",marker='x')
ax1.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax1.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax1.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

ax2 = fig.add_subplot(222)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==2], c="yellowgreen",marker='^')
ax2.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")
ax2.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));


ax3 = fig.add_subplot(223)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df[df['species']==3], c="darkgray",marker='*')
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(0,0.5))
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx, position=(0.5,0));

We will create the same plot using Seaborn as well.


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')

g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='sex', data=df,palette="rocket",
                legend='full',col='species', col_wrap=2, 
                height=4, aspect=1.6, sizes=(800,800))

g.fig.suptitle('Penguins',position=(0.5,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);


Notice that here the subplots representing the species are further divided into two classes i.e. Male and Female. Again we can notice how Seaborn stands out to be superior by producing a better graph with a few lines of code.

We can also add different markers for each species in the above graph. Let’s do that.


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')
g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df,palette="rocket",
                col='species', col_wrap=4, legend='full',
                height=6, aspect=0.5, style='species', sizes=(800,1000))

g.fig.suptitle('Penguins' ,position=(0.4,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);

In a similar fashion as shown above, we can make the subplots share the same y-axis instead of sharing the same x-axis. The following plots represent the same.


Matplotlib code:

fig = plt.figure()

plt.rcParams['figure.figsize'] = [12,12]
plt.rcParams["font.weight"] = "bold"

plt.subplots_adjust(hspace=0.60)


fontdicty={'fontsize': 20,
          'weight' : 'bold',
          'verticalalignment': 'baseline',
          'horizontalalignment': 'center'}

fontdictx={'fontsize': 20,
          'weight' : 'bold',
          'horizontalalignment': 'center'}

fig.suptitle('Penguins', fontsize=25,fontweight="bold", color="black", 
             position=(0.5,1.0))

#ax2 is defined first because the other plots are sharing its x-axis
ax2 = fig.add_subplot(141, sharex=ax2)
ax2.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==2], c="skyblue")
ax2.set_title('Adelie', fontdict=fontdict, color="skyblue")
ax2.set_ylabel("Bill depth (mm)", fontdict=fontdicty, position=(-0.3,0.5))


ax1 = fig.add_subplot(142, sharex=ax2)
ax1.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==1], c="yellowgreen")
ax1.set_title('Chinstrap', fontdict=fontdict, color="yellowgreen")


ax3 = fig.add_subplot(143, sharex=ax2)
ax3.scatter('bill_length_mm', 'bill_depth_mm', data=df.loc[df['species']==3], c="darkgray")
ax3.set_title('Gentoo', fontdict=fontdict, color="darkgray")
ax3.set_xlabel("Bill Length (mm)", fontdict=fontdictx,position=(-0.7,0));


Seaborn code:

sns.set(rc={'figure.figsize':(20,20)}) 
sns.set_context('talk', font_scale=1) 
sns.set_style('ticks')
g = sns.relplot(x='bill_length_mm', y='bill_depth_mm', hue='species', data=df,palette="rocket",
                col='species', col_wrap=4, legend='full',
                height=6, aspect=0.5, style='species', sizes=(800,1000))

g.fig.suptitle('Penguins' ,position=(0.4,1.05), fontweight='bold', size=20)
g.set_xlabels("Bill Length (mm)",fontweight='bold', size=15)
g.set_ylabels("Bill Depth (mm)",fontweight='bold', size=15);


So, this is how you can create subplots. It can be done with any other kind of graphs as well such as line graphs, histograms etc. Let us try our hand at different kinds of graphs for visualization.


Line plot


For plotting this kind of graph we will create some random data using numpy and random libraries.


Code for creating data:

import numpy as np
from random import *

rng = np.random.RandomState(0)
x = np.linspace(0, 10, 8)
y = np.cumsum(rng.randn(8, 8), 0)

Matplotlib code:

plt.figure(figsize=(14,7))
plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

The same matplotlib code with seaborn overwriting matplotlib’s default parameters to generate a more pleasing graph.


Seaborn code:

sns.set(rc={'figure.figsize':(14,7)})
sns.set_context('talk', font_scale=0.9)
sns.set_style('darkgrid')

plt.plot(x, y)
plt.legend('ABCDEF', ncol=2, loc='upper left');

To enhance the graph we could include markers this way:


Code:

sns.set(rc={'figure.figsize':(14,7)})
sns.set_context('talk', font_scale=0.9)
sns.set_style('darkgrid')

plt.plot(x, y, marker='o')
plt.legend('ABCDEF', ncol=2, loc='upper left');

To make each line distinct we can add different makers along different lines the following way.


Code:

sns.set(rc={'figure.figsize':(14,7)})
sns.set_context('talk', font_scale=0.9)
sns.set_style('darkgrid')
L=[]


for j in range(len(y)):
    l=[]
    for i in y:
        l.append(i[j])
    L.append(l)

plt.plot(x, L[0], marker='o',label='A')
plt.plot(x, L[1], marker='^',label='B')
plt.plot(x, L[2], marker='s',label='C')
plt.plot(x, L[3], marker='D',label='D')
plt.plot(x, L[4], marker='*',label='E')
plt.plot(x, L[5], marker='+',label='F')
plt.legend(ncol=2,loc='lower left');

Notice that we have now altered the position of the legend in the graph.


Bar graphs


For playing around with such graphs we will be using 'titanic' dataset available in Seaborn library. The dataset contains details like age, sex, class, fare,embark_town, survived or not etc of people aboard the titanic.


Let's begin. We will be plotting the graph showing the count of 'survival' or 'no survival' for different classes of people. We will convert the dataset into a pandas dataframe.


Code for dataset:

df2 = sns.load_dataset("titanic")
df2.head()

First= df2[df2['class']=='First']['survived'].value_counts()
Second= df2[df2['class']=='Second']['survived'].value_counts()
Third=df2[df2['class']=='Third']['survived'].value_counts()
df3 = pd.DataFrame([First,Second,Third])
df3.index=['First','Second','Third']

Matplotlib code:

df3.plot(kind='bar',figsize=(14,7),title='Titanic survial on basis of class',cmap='Set2')
plt.show()

Note that here we have two separate colored bars to represent the 'survived' column of the data. A value of '0' represents 'not survived' and a value of '1' represents 'survived'. We can clearly make the observation that the most people who did not survive were from the 'Third' class.


Coming back to the graph. We can make it look more attractive by changing the default parameters.


We can change the orientation of the xticks and y ticks of the graph and we can add annotation to it as well to make the graph easily understandable.


Code:

ax= df3.plot(kind='bar',figsize=(14,7), title='Titanic survial on basis of class',cmap='Set2')
plt.xticks(rotation=20)
plt.yticks(rotation=20)

for i in ax.patches:
    # get_x pulls left or right; get_height pushes up or down
    ax.text(i.get_x()+0.07, i.get_height()+5, \
            str(round((i.get_height()), 2)), fontsize=11, color='steelblue')
plt.show()

To add more to it we can also include some design in the bars to make them look more stylish. This is done by using the parameter 'hatch'. Also we will be tilting the annotation to add in one more difference to the graph.


Code:

ax= df3.plot(kind='bar',figsize=(14,7), title='Titanic survial on basis of class',cmap='Set2',hatch='|',edgecolor='aliceblue')
plt.xticks(rotation=20)

for i in ax.patches:
    # get_x pulls left or right; get_height pushes up or down
    ax.text(i.get_x()+0.07, i.get_height()+5, \
            str(round((i.get_height()), 2)), fontsize=11, color='steelblue',
                rotation=45)
plt.show()

Moreover, we can also go further and can change the hatch design for the two different bars as shown below.


Code:

ax= df3.plot(kind='bar',figsize=(14,7), title='Titanic survial on basis of class',cmap='Set2',hatch='O',edgecolor='aliceblue')
plt.xticks(rotation=20)
plt.yticks(rotation=20)

bars = ax.patches
patterns = ['/', '.']  # set hatch patterns in the correct order
hatches = []  # list for hatches in the order of the bars
for h in patterns:  # loop over patterns to create bar-ordered hatches
    for i in range(int(len(bars) / len(patterns))):
        hatches.append(h)
for bar, hatch in zip(bars, hatches):  # loop over bars and hatches to set hatches in correct order
    bar.set_hatch(hatch)
# generate legend. this is important to set explicitly, otherwise no hatches will be shown!

for i in ax.patches:
    # get_x pulls left or right; get_height pushes up or down
    ax.text(i.get_x()+0.07, i.get_height()+5, \
            str(round((i.get_height()), 2)), fontsize=11, color='steelblue')
ax.legend()
plt.show()

You can also give each bar its unique hatch design by adding as many hatches as bars in the 'patterns' list. Go ahead and give it a try.


In addition, we can also have this bar graph in a horizontal orientation. The only change is that we will be using the parameter 'kind' equal to 'barh' instead of 'bar' while plotting.


Code:

ax= df3.plot(kind='barh',figsize=(14,7), title='Titanic survial on basis of class',cmap='Set2',hatch='O',edgecolor='aliceblue')
plt.xticks(rotation=20)
plt.yticks(rotation=20)

bars = ax