Interpolating functions on your data using Python [PART 1]

Hey scientist! How is it going?
So, you have that naughty dataset and you want to estimate its behavior in some points which aren’t there? This post is just for you!
Today we’ll see how to estimate new points in a set using already existent ones. This process is called interpolation. Let’s do this!

To show the interpolation of a function between the points, we’ll keep using the csv file which shows the series of Yahoo prices, which we saw in this post. Keep cool: we’ll show how to get it on the beginning of this week code.

We split all the code into three parts, OK? Let’s see them:

from scipy.interpolate import interp1d
from urllib.request import urlopen

import matplotlib.pyplot as plt
import numpy as np

url = ''
yahoo_csv = urlopen(url)
data_yahoo = np.genfromtxt(yahoo_csv, delimiter=',', dtype=None)

# Checking out the file header.
data_yahoo[0:4, :]

First we indicate the functions and packages which we’ll use. We also input the URL which contains the csv file. We access it using urlopen(), and attribute its contents to data_yahoo using the genfromtxt() function. Note that we use the URL which we’d like to access as the argument of urlopen(). Also, we provide several parameters to genfromtxt():

  • delimiter = ‘,’ indicates that the delimiter between fields is the comma.
  • dtype=None tells to genfromtxt() to not specify a datatype to the variable; we’ll do that later.

After that, we check the five first rows of the file, using data_yahoo[0:4, :]. This is the result:

array([[b'Date', b'Open', b'High', b'Low', b'Close', b'Volume', b'Adj Close'],
       [b'2015-10-16', b'33.639999', b'33.860001', b'33.16', b'33.369999', b'12209300', b'33.369999'],
       [b'2015-10-15', b'32.419998', b'33.490002', b'32.400002', b'33.48', b'19370900', b'33.48'],
       [b'2015-10-14', b'32.279999', b'32.490002', b'31.77', b'32.09', b'11277400', b'32.09']], 

Note that the first line contains the variable names, and the first column contains the data. We’ll work on the column ‘Adj Close’.

adj_close = np.array(data_yahoo[1:, 6:].flatten(), dtype=float)
plt.plot(adj_close, 'r')
plt.ylabel('Adjusted Close Price')

This piece of code we attribute the column ‘Adj Close’, without the first line (data_yahoo[1:, 6:]), to adj_close. Remember that Python starts the indexes with zero! The first line is 0; the seventh column is 6. We use the argument flatten() to ensure that we’ll have a one-dimensional vector, and we tell that the data are floats (dtype=float).

We plotted the data after cleaning it. The plot have X and Y labels. The resulting plot is this one:

We made this plot using Pandas in this post. Check it out!

# Using the vector size to create the X axis.
end = np.shape(adj_close)[0]
adj_x = np.linspace(0, end, end, endpoint=True)

# Interpolating points in the entire function.
interp_linear = interp1d(adj_x, adj_close, kind='linear')
interp_adjclose = interp_linear(adj_x)

# Plotting the interpolation.
plt.plot(adj_x, adj_close, 'ro', adj_x, interp_adjclose, 'k--')
plt.ylabel('Adjusted Close Price')

Now it’s time to interpolate the data! We use interp1d, from scipy.interpolate. It generates a function of points, based on our data. Its argument ‘kind’ specifies the interpolation type used. The valid arguments are ‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’. The default is ‘linear’.
The arguments ‘slinear’, ‘quadratic’ and ‘cubic’ refer to the interpolation using a first, second or third order spline. If we use an integer, it’ll refer to the order of the spline that will be used.
For more info, check SciPy interp1d documentation.

The next plot presents the original plot (on red spheres) and the interpolation (on black traces):

Looks nice! Let’s see the effect in a smaller interval:

# Let's try in a smaller interval.
# We'll get the first ten days.
adj_piece = adj_close[:10]

# Let's create an X axis from 0 to 10, with 50 points:
axis_x = np.linspace(0, 10)

# We'll use interp_linear with axis_x values.
interp_piece_lin = interp_linear(axis_x)
# Plotting the interpolation.
plt.plot(axis_x, interp_piece_lin, 'k+', adj_piece, 'ro')
plt.ylabel('Adjusted Close Price - Linear interpolation')

# Trying now the cubic interpolation (kind='cubic').
interp_cubic = interp1d(adj_x, adj_close, kind='cubic')
interp_piece_cub = interp_cubic(axis_x)
# Plotting the interpolation.
plt.plot(axis_x, interp_piece_cub, 'k+', adj_piece, 'ro')
plt.ylabel('Adjusted Close Price - Cubic interpolation')

We took ten days from the original interval, created an axis_x with 50 elements between 0 and 10 and interpolated values for the points on the x axis. After that we plotted the results, using linear and cubic interpolations. The linear interpolation is this one:

On its turn, the cubic interpolation is this one:

Awesome! We interpolated functions on our data using only one SciPy function!
We saw tons of info on this post! We used urlopen(), opened a csv file using numpy, separated data from our file, used two types of interpolation and plotted several figures. Whew! Try to use another slices from the dataset on the interpolation! Use also another arguments to interpolate; for example, kind=’slinear’ or kind=5. Change also the plot arguments! Next week we’ll talk more about interpolation, OK?
Thanks scientist! Gigaregards!

Like this? Please comment and share with your friends!
Want to download Programando Ciência codes? Go to our GitHub!
Make a donation for Programando Ciência!
Like us also on Facebook:
I’m on Twitter! Follow me if you can! @alexdesiqueira

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s