Linear Regression Using Numpy

Giuseppe Vettigli

Mar. 26, 12 · Interview

Likes (0)

Comment

Save

13.7K Views

A few posts ago, we saw how to use the function numpy.linalg.lstsq(...) to solve an over-determined system. This time, we'll use it to estimate the parameters of a regression line.

A linear regression line is of the form w₁x+w₂=y and it is the line that minimizes the sum of the squares of the distance from each data point to the line. So, given n pairs of data (x_i, y_i), the parameters that we are looking for are w₁ and w₂ which minimize the error

and we can compute the parameter vector w = (w₁ , w₂)^T as the least-squares solution of the following over-determined system

Let's use numpy to compute the regression line:

from numpy import arange,array,ones,random,linalg
from pylab import plot,show

xi = arange(0,9)
A = array([ xi, ones(9)])
# linearly generated sequence
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]
w = linalg.lstsq(A.T,y)[0] # obtaining the parameters

# plotting the line
line = w[0]*xi+w[1] # regression line
plot(xi,line,'r-',xi,y,'o')
show()

We can see the result in the plot below.

You can find more about data fitting using numpy in the following posts:

Linear regression NumPy

Published at DZone with permission of Giuseppe Vettigli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending

Linear Regression Using Numpy

Related

Partner Resources