# Complex scatter plots on Python [PART II] – Defining colors, labels and title

Hey scientist! How is it going?
In the second post of this series we’ll improve our preliminary scatter plot, obtained from IBGE [1] data, defining a color for each region, adding labels and title. Then, our plot becomes more informative. Let’s do this!

Just to remember, on our first post [2] we used the following codes to create our preliminary plot:

```# importing necessary packages.
import matplotlib.pyplot as plt
import pandas as pd

# generating the plot.
plt.scatter(x = data_brazil['LifeExpec'],
y = data_brazil['GDPperCapita'],
s = data_brazil['PopX1000'])
plt.show()
```

With these commands we imported Pandas [3] and matplotlib [4], read the file data_ibge.xls [5], and generated the following plot:

The axes of this plot present the life expectancy and GDP per capita, and the population of each state is represented by the size of each circle. But we still don’t know which state is each one, or the region where it is.

To put more information on our plot, let’s add labels and title to it. Also, we’ll define a color for each region, so each state will have a color related to its region.

First we’ll create a list with the hexa values from the color palette 5-class Dark2, from ColorBrewer2 [6]. This vector has five colors, each one representing a Brazilian region:

```colors = ['#1b9e77',
'#d95f02',
'#7570b3',
'#e7298a',
'#66a61e']
```

Now we define a function, attribute_color(), which has a dictionary pointing to the color related to each region. When the function receives a region which isn’t defined on the dictionary, attribute_color() returns the color black.

```def attribute_color(region):
colors = {
'North':'#1b9e77',
'Northeast':'#d95f02',
'Southeast':'#7570b3',
'South':'#e7298a',
'Central-West':'#66a61e'
}
return colors.get(region, 'black')
```

Then we create the color vector, a list which receives the color of each state according to its region. The quantity of states is calculated by the length len(data_brazil[‘Region’]), and the list receives each value using append().

```color_region = list()
qty_states = len(data_brazil['Region'])

for state in range(qty_states):
color_region.append(attribute_color(data_brazil['Region'][state]))
```

Using print() we can see the color vector:

```print(color_region)

['#1b9e77', '#1b9e77', '#1b9e77', '#1b9e77', '#1b9e77', '#1b9e77', '#1b9e77', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#d95f02', '#7570b3', '#7570b3', '#7570b3', '#7570b3', '#e7298a', '#e7298a', '#e7298a', '#66a61e', '#66a61e', '#66a61e', '#66a61e']
```

Now we’ll use color_region as an argument in matplotlib’s scatter(). Also, we’ll pass the argument alpha = 0.6, which will show the circles more transparent:

```plt.scatter(x = data_brazil['LifeExpec'],
y = data_brazil['GDPperCapita'],
s = data_brazil['PopX1000'],
c = color_region,
alpha = 0.6)
```

Then we put the title, labels on X and Y axis, and also a grid:

```plt.title('Brazilian development in 2013, according to each state', fontsize=22)
plt.xlabel('Life expectancy (years)', fontsize=22)
plt.ylabel('GDP per capita (R\$)', fontsize=22)
plt.grid(True)
```

The argument fontsize=22 increases the size of the font shown.

We use plt.show() to present the plot. Check the resulting picture:

Our plot is a lot better than the initial! Now, for instance, we can see that the states tend to be close, according to their region.

We don’t know which state is each one, or estimate the amount of people… we also need to know which color represents each region. We’ll solve this on the next week, with more labels and legends! Stay with us!

Thanks scientist! Gigaregards, see you next time!

Did you like this post? Please comment and share with your friends!