Over a million developers have joined DZone.
Platinum Partner

Linear Regression Using Numpy

· Web Dev Zone

The Web Dev Zone is brought to you in partnership with Mendix.  Discover how IT departments looking for ways to keep up with demand for business apps has caused a new breed of developers to surface - the Rapid Application Developer.

A few posts ago, we saw how to use the function numpy.linalg.lstsq(...) to solve an over-determined system. This time, we'll use it to estimate the parameters of a regression line.

A linear regression line is of the form w1x+w2=y and it is the line that minimizes the sum of the squares of the distance from each data point to the line. So, given n pairs of data (xi, yi), the parameters that we are looking for are w1 and w2 which minimize the error

and we can compute the parameter vector w = (w1 , w2)T as the least-squares solution of the following over-determined system

Let's use numpy to compute the regression line:
from numpy import arange,array,ones,random,linalg
from pylab import plot,show

xi = arange(0,9)
A = array([ xi, ones(9)])
# linearly generated sequence
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]
w = linalg.lstsq(A.T,y)[0] # obtaining the parameters

# plotting the line
line = w[0]*xi+w[1] # regression line
We can see the result in the plot below.

You can find more about data fitting using numpy in the following posts:

The Web Dev Zone is brought to you in partnership with Mendix.  Learn more about The Essentials of Digital Innovation and how it needs to be at the heart of every organization.


Published at DZone with permission of Giuseppe Vettigli , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}