Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Classification From Scratch, Part 8 of 8: Linear Discrimination

We continue our article series on using R for creating regression and classification models from scratch by looking at Bayes/Naive Classifiers and Linear Regression.

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

The Architect’s Guide to Big Data Application Performance. Get the Guide.

This is the eight post of our series on classification from scratch. The latest one was on SVM, and today, I want to get back to some very old stuff and also a linear separation of the space, using Fisher’s linear discriminant analysis.

## Bayes (Naive) Classifier

Consider the following naive classification rule:

or

In the case where y takes two values, that will be standard {0,1} here, one can rewrite the later as

and the set

is called the decision boundary.

Assume that

XY=0∼N(μ0,Σ)

and

XY=1∼N(μ1,Σ

then explicit expressions can be derived

where ry2 is the Manalahobis distance

Let δbe defined as

the decision boundary of this classifier is

{x such that δ0(x)=δ1(x)}

which is quadratic in x. This is the quadratic discriminant analysis. This can be visualized as below.

The decision boundary is here:

But that can’t be the linear discriminant analysis, right? I mean, the frontier is not linear… Actually, in Fisher’s seminal paper, it was assumed that Σ0=Σ1.

In that case, actually,

and the decision frontier is now linear in x. This is the linear discriminant analysis. This can be visualized as below:

Here the two samples have the same variance matrix and the frontier is:

## Link With the Logistic Regression

Assume as previously that

XY=0∼N(μ0,Σ)

and

\XY=1∼N(μ1,Σ

then

is equal to

which is linear in

Hence, when each group has Gaussian distributions with an identical variance matrix, then LDA and the logistic regression lead to the same classification rule.

Observe furthermore that the slope is proportional to Σ−1[μ1−μ0], as stated in Fisher’s article. But to obtain such a relationship, he observes that the ratio of between and within variances (in the two groups) was

which is maximal when ω is proportional to Σ−1>[μ1−μ0], when Σ0=Σ1.

## Homebrew Linear Discriminant Analysis

To compute vector ω

m0 = apply(myocarde[myocarde$PRONO=="0",1:7],2,mean) m1 = apply(myocarde[myocarde$PRONO=="1",1:7],2,mean)
Sigma = var(myocarde[,1:7])
omega = solve(Sigma)%*%(m1-m0)
omega
[,1]
FRCAR -0.012909708542
INCAR  1.088582058796
INSYS -0.019390084344
PRDIA -0.025817110020
PAPUL  0.020441287970
PVENT -0.038298291091
REPUL -0.001371677757

For the constant – in the equation ω^x+b=0 – if we have equiprobable probabilities, use

b = (t(m1)%*%solve(Sigma)%*%m1-t(m0)%*%solve(Sigma)%*%m0)/2

## Application (on the Small Dataset)

In order to visualize what’s going on, consider the small dataset, with only two covariates:

x = c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
y = c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
z = c(1,1,1,1,1,0,0,1,0,0)
df = data.frame(x1=x,x2=y,y=as.factor(z))
m0 = apply(df[df$y=="0",1:2],2,mean) m1 = apply(df[df$y=="1",1:2],2,mean)
Sigma = var(df[,1:2])
omega = solve(Sigma)%*%(m1-m0)
omega
[,1]
x1 -2.640613174
x2  4.858705676

Using R regular functions, we get:

library(MASS)
fit_lda = lda(y ~x1+x2 , data=df)
fit_lda

Coefficients of linear discriminants:
LD1
x1 -2.588389554
x2  4.762614663

which is the same coefficient as the one we got with our own code. For the constant, use:

b = (t(m1)%*%solve(Sigma)%*%m1-t(m0)%*%solve(Sigma)%*%m0)/2

If we plot it, we get the red straight line

plot(df$x1,df$x2,pch=c(1,19)[1+(df$y=="1")]) abline(a=b/omega[2],b=-omega[1]/omega[2],col="red") As we can see (with the blue points), our red line intersects the middle of the segment of the two barycenters: points(m0["x1"],m0["x2"],pch=4) points(m1["x1"],m1["x2"],pch=4) segments(m0["x1"],m0["x2"],m1["x1"],m1["x2"],col="blue") points(.5*m0["x1"]+.5*m1["x1"],.5*m0["x2"]+.5*m1["x2"],col="blue",pch=19) Of course, we can also use an R function: predlda = function(x,y) predict(fit_lda, data.frame(x1=x,x2=y))$class==1
vv=outer(vu,vu,predlda)
contour(vu,vu,vv,add=TRUE,lwd=2,levels = .5)

One can also consider the quadratic discriminant analysis since it might be difficult to argue that Σ0=Σ1.

fit_qda = qda(y ~x1+x2 , data=df)

The separation curve is here:

plot(df$x1,df$x2,pch=19,
col=c("blue","red")[1+(df$y=="1")]) predqda=function(x,y) predict(fit_qda, data.frame(x1=x,x2=y))$class==1
vv=outer(vu,vu,predlda)
contour(vu,vu,vv,add=TRUE,lwd=2,levels = .5)

Learn how taking a DataOps approach will help you speed up processes and increase data quality by providing streamlined analytics pipelines via automation and testing. Learn More.

Topics:
data classification ,big data ,r ,bayes classifier

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.