Over a million developers have joined DZone.

Linear Regression Using Numpy

DZone's Guide to

Linear Regression Using Numpy

· Web Dev Zone
Free Resource

Add user login and MFA to your next project in minutes. Create a free Okta developer account, drop in one of our SDKs to your application and get back to building.

A few posts ago, we saw how to use the function numpy.linalg.lstsq(...) to solve an over-determined system. This time, we'll use it to estimate the parameters of a regression line.

A linear regression line is of the form w 1x+w 2=y and it is the line that minimizes the sum of the squares of the distance from each data point to the line. So, given n pairs of data (x i, y i), the parameters that we are looking for are w 1 and w 2 which minimize the error

and we can compute the parameter vector w = (w 1 , w 2) T as the least-squares solution of the following over-determined system

Let's use numpy to compute the regression line:
from numpy import arange,array,ones,random,linalg
from pylab import plot,show

xi = arange(0,9)
A = array([ xi, ones(9)])
# linearly generated sequence
y = [19, 20, 20.5, 21.5, 22, 23, 23, 25.5, 24]
w = linalg.lstsq(A.T,y)[0] # obtaining the parameters

# plotting the line
line = w[0]*xi+w[1] # regression line
We can see the result in the plot below.

You can find more about data fitting using numpy in the following posts:

Launch your application faster with Okta’s user management API. Register today for the free forever developer edition!


Published at DZone with permission of Giuseppe Vettigli, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}