Complex scatter plots on Python [PART III] – Inserting labels into elements and defining more than one legend

Hey scientist! How is it going?

Do you remember our scatter plot based on IBGE [1] data? In the last post of this series we’ll make it even better, inserting labels indicating the states and defining two legends related to the regions and the population. Now the plot will be awesome! Let’s do this!

Do you remember the first two posts of this series ([2], [3])? From them we generated the following plot. The comments, as well as the previous posts, explain what each part of the code does, OK?

scatter_preliminaryII

The code necessary for generating this plot follows:

# importing necessary packages.
import matplotlib.pyplot as plt
import pandas as pd

# reading data_ibge.xls
data_brazil = pd.read_excel('data_ibge.xls', sheetname=2)

# color palette 5-class Dark2, from ColorBrewer2: http://colorbrewer2.org/
colors = ['#1b9e77',
          '#d95f02',
          '#7570b3',
          '#e7298a',
          '#66a61e']

# attribute_color() points to the color correspondent to each region.
def attribute_color(region):
    colors = {
        'North':'#1b9e77',
        'Northeast':'#d95f02',
        'Southeast':'#7570b3',
        'South':'#e7298a',
        'Central-West':'#66a61e'
    }
    return colors.get(region, 'black')

# creating the color vector.
color_region = list()
qty_states = len(data_brazil['Region'])

for state in range(qty_states):
    color_region.append(attribute_color(data_brazil['Region'][state]))

# generating the plot.
plt.scatter(x = data_brazil['LifeExpec'],
            y = data_brazil['GDPperCapita'],
            s = data_brazil['PopX1000'],
            c = color_region,
            alpha = 0.6)

plt.title('Brazilian development in 2013, according to each state', fontsize=22)
plt.xlabel('Life expectancy (years)', fontsize=22)
plt.ylabel('GDP per capita (R$)', fontsize=22)
plt.grid(True)

We’ll continue working on this code. Download data_ibge.xls [4] if you need.

First we’ll insert the abbreviation of each state in its respective circle. Let’s use the function text() from matplotlib [5] for that. The arguments are the text coordinates in the plot (x, y) and a string, which will be the exhibited text (s).

We’ll take advantage of the coordinates and abbreviations in data_brazil. Therefore:

  • For the first state, x = data_brazil[‘LifeExpec’][0], y = data_brazil[‘GDPperCapita’][0], and s = data_brazil[‘UF’][0];
  • For the second state, x = data_brazil[‘LifeExpec’][1], y = data_brazil[‘GDPperCapita’][1], and s = data_brazil[‘UF’][1], and so on.

Let’s put all in a for command, so we don’t need to repeat the instructions several times.

for state in range(len(data_brazil['UF'])):
    plt.text(x = data_brazil['LifeExpec'][state],
             y = data_brazil['GDPperCapita'][state],
             s = data_brazil['UF'][state],
             fontsize=16)

And done! Quick, right?

Now we will define the legends. The idea is to adapt a 2D object with the colors we defined, since the automatic legend won’t work in our example. For the first legend we define a vector indication the regions, in the order they appear on our table:

regions = ['North',
           'Northeast',
           'Southeast',
           'South',
           'Central-West']

Then we define the objects, based on [6]. Here we create a list and add elements with the colors within our vector colors. We need the package matplotlib.lines to create these elements. After creating the list, we use regions and legend1_line2d to generate a legend using the legend() function. Besides that arguments, we use several others; try to modificate them to see what they represent!

import matplotlib.lines as mlines

legend1_line2d = list()
for step in range(len(colors)):
    legend1_line2d.append(mlines.Line2D([0], [0],
                                        linestyle='none',
                                        marker='o',
                                        alpha=0.6,
                                        markersize=15,
                                        markerfacecolor=colors[step]))

legend1 = plt.legend(legend1_line2d,
                     regions,
                     numpoints=1,
                     fontsize=22,
                     loc='best',
                     shadow=True)

Let’s create the second legend. There are a bunch of advanced resources here. They appear commented on the code, so you can understand them better:

legend2_line2d = list()
legend2_line2d.append(mlines.Line2D([0], [0],
                                    linestyle='none',
                                    marker='o',
                                    alpha=0.6,
                                    markersize=np.sqrt(100),
                                    markerfacecolor='#D3D3D3'))
legend2_line2d.append(mlines.Line2D([0], [0],
                                    linestyle='none',
                                    marker='o',
                                    alpha=0.6,
                                    markersize=np.sqrt(1000),
                                    markerfacecolor='#D3D3D3'))
legend2_line2d.append(mlines.Line2D([0], [0],
                                    linestyle='none',
                                    marker='o',
                                    alpha=0.6,
                                    markersize=np.sqrt(10000),
                                    markerfacecolor='#D3D3D3'))

legend2 = plt.legend(legend2_line2d,
                     ['1', '10', '100'],
                     title='Population (in 100,000)',
                     numpoints=1,
                     fontsize=20,
                     loc='upper left',
                     frameon=False,  # no edges
                     labelspacing=3, # increase spacing between labels
                     handlelength=5, # increase spacing between objects and text
                     borderpad=4     # increase the margins of the legend
                    )
plt.gca().add_artist(legend1)

plt.setp(legend2.get_title(),fontsize=22)  # increasing the legend font

Finally, we use plt.show() to show the plot. Check the result:

scatter_finalENUS.jpg

In this post series we employed several Python elements, from the basics to some advanced ones. Now we can look to the plot and have more information than only seeing the table, isn’t it? The complete code for this plot is available [7]! Check it out!

Did you like the result? What would you do different? Do you have ideas on data to study? Write them to us in the comments!

Thanks scientist! Gigaregards, see you next time!


Did you like this post? Please comment and share with your friends!
Want to download Programando Ciência codes? Go to our GitHub!
Make a donation for Programando Ciência!
Like us also on Facebook: www.facebook.com/programandociencia
I’m on Twitter! Follow me if you can! @alexdesiqueira

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s