# Intuitive Linear Regression for Machine Learning

# Intuitive Linear Regression for Machine Learning

### Linear regression is an iterative algorithm, as many algorithms are in machine learning. It is simple, but it is very useful.

Join the DZone community and get the full member experience.

Join For FreeHow to Simplify Apache Kafka. Get eBook.

In this article, we will go through the intuition of linear regression and a straightforward implementation of the algorithm. This article is adapted from this booklet, in which you can find the mathematics behind the algorithm as well as detailed explanation and implementation details.

Linear regression is a simple yet useful learning algorithm that can be seen as a statistical or an optimization problem. For simple regression, there are optimal analytical solutions; however, for high dimensions problems, there are not. Regression fits a function to a data set, so what we are trying to do is to find a representative function and fit it to our data set. Learning takes place as finding the best possible (local optimum) values of the function parameters. Linearity refers to the fact that we are trying to fit either a straight line or a polynomial function (polynomial regression).

Let's go through an example, given a set of house sizes and house prices such that the data set is `(x, y)`

where `x`

is the house size and `y`

is the house price. The task is to estimate the house price given a house size. If we regard the house size and the target price as continuous, we can model the situation as a regression problem. Because we want to fit a linear function to the data or approximate the output as a linear function of the output, this can be modeled as a linear regression problem. The data set has holes, meaning that it does not contain the size and price of every single house so the task is to estimate the target price given an arbitrary house size.

The model pipeline, as a supervised learning problem, is training set > learning algorithm > hypothesis. The training set is fed into the learning algorithm which outputs a function, conventionally called the hypothesis and denoted by `h`

. `h`

is a function mapping from input to output or features to output. The first task when solving this problem is to decide what representation of `h`

should be used. For linear regression, we use a linear function. For polynomial regression, we use a polynomial function.

The main idea is, given a set of input vectors and output labels, to minimize the average difference, error, between the correct output labels and the actual labels. One way to do this is by using the mean square error, MSE, as a measure of the error. We need to minimize the MSE so that we find a good straight line or a polynomial fit for our data set. We do this by using the gradient descent algorithm.

The algorithm is iterative, as many others are in machine learning. At each iteration, we measure the error, we make changes to the parameters in order to minimize the error, and we carry on until we reach a small predefined threshold or a predefined number of iterations.

In the case of linear regression, parameters are either the slope and y-intercept for a straight line or the polynomial coefficients if we are using a polynomial function.

Here is a straightforward implementation of the algorithm (dataset from *Consumer's Digest*):

```
from operator import add, mul
def process_training_set(examples):
result = {}
for k, v in examples.items():
result[(1, k)] = v
return result
def estimate(x, params):
return hypothesis((1, x), params)
def hypothesis(x, params):
return sum(map(mul, x, params))
def delta(training_set, params, previous_cost, learning_rate):
cost = 0
for x, y in training_set.items():
cost = cost + hypothesis(x, params) - y
cost = (1.0 / len(training_set)) * cost
for x in training_set.iterkeys():
for j in range(len(params)):
params[j] = params[j] - (learning_rate * cost * x[j])
if cost == previous_cost:
return True, cost
return False, cost
def train(training_set, params, learning_rate=0.1, delta=delta):
result, c = delta(training_set, params, float('nan'), learning_rate)
while not result:
result, c = delta(training_set, params, c, learning_rate)
print('params:', params, 'cost-funtion:', c)
return params, c
def estimate_set(data_set, params):
s = ((1, x) for x in data_set)
result = {}
for x in s:
result[x[1]] = hypothesis(x, params)
return result
def estimation_mse(reference, estimates):
mse = 0
for k, v in reference.items():
mse = mse + (estimates[k] - v) ** 2
return len(reference) * mse
training_set = {
12.39999962: 11.19999981,
14.30000019: 12.5,
14.5: 12.69999981,
14.89999962: 13.10000038,
16.10000038: 14.10000038,
16.89999962: 14.80000019,
16.5: 14.39999962,
15.39999962: 13.39999962,
17: 14.89999962,
17.89999962: 15.60000038,
18.79999924: 16.39999962,
20.29999924: 17.70000076,
22.39999962: 19.60000038,
19.39999962: 16.89999962,
}
test_set = {
15.5: 14,
16.70000076: 14.60000038,
17.29999924: 15.10000038,
18.39999962: 16.10000038,
19.20000076: 16.79999924,
17.39999962: 15.19999981,
19.5: 17,
19.70000076: 17.20000076,
21.20000076: 18.60000038,
}
```

12 Best Practices for Modern Data Ingestion. Download White Paper.

Published at DZone with permission of Kareem Alkaseer . See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}