Over a million developers have joined DZone.

An Update on Boosting with Splines

DZone's Guide to

An Update on Boosting with Splines

· Big Data Zone
Free Resource

Free O'Reilly eBook: Learn how to architect always-on apps that scale. Brought to you by Mesosphere DC/OS–the premier platform for containers and big data.

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using

> library(splines)
> fit=lm(y~bs(x,degree=1,df=3),data=df)

The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is

  • start with some regression model 
  • compute the residuals, including some shrinkage parameter,

then the strategy is to model those residuals

  • at step , consider regression 
  • update the residuals 

and to loop. Then set

I thought that boosting would work well if at step , it was possible to change the knots. But the output

was quite disappointing: boosting does not improve the prediction here. And it looks like knots don’t change. Actually, if we select the ‘best‘ knots, the output is much better. The dataset is still

> n=300
> set.seed(1)
> u=sort(runif(n)*2*pi)
> y=sin(u)+rnorm(n)/4
> df=data.frame(x=u,y=y)

For an optimal choice of knot locations, we can use

> library(freeknotsplines)
> xy.freekt=freelsgen(df$x, df$y, degree = 1, 
+ numknot = 2, 555)

The code of the previous post can simply be updated

> v=.05
> library(splines)
> xy.freekt=freelsgen(df$x, df$y, degree = 1, 
+ numknot = 2, 555)
> fit=lm(y~bs(x,degree=1,knots=
+ xy.freekt@optknot),data=df)
> yp=predict(fit,newdata=df)
> df$yr=df$y - v*yp
> YP=v*yp
>  for(t in 1:200){
+    xy.freekt=freelsgen(df$x, df$yr, degree = 1,
+    numknot = 2, 555)
+ fit=lm(yr~bs(x,degree=1,knots=
+     xy.freekt@optknot),data=df)
+    yp=predict(fit,newdata=df)
+    df$yr=df$yr - v*yp
+    YP=cbind(YP,v*yp)
+  }
>  nd=data.frame(x=seq(0,2*pi,by=.01))
>  viz=function(M){
+    if(M==1)  y=YP[,1]
+    if(M>1)   y=apply(YP[,1:M],1,sum)
+    plot(df$x,df$y,ylab="",xlab="")
+    lines(df$x,y,type="l",col="red",lwd=3)
+    fit=lm(y~bs(x,degree=1,df=3),data=df)
+    yp=predict(fit,newdata=nd)
+    lines(nd$x,yp,type="l",col="blue",lwd=3)
+    lines(nd$x,sin(nd$x),lty=2)}

>  viz(100)

I like that graph. I had the intuition that using (simple) splines would be possible, and indeed, we get a very smooth prediction.

Easily deploy & scale your data pipelines in clicks. Run Spark, Kafka, Cassandra + more on shared infrastructure and blow away your data silos. Learn how with Mesosphere DC/OS.

spline ,bigdata ,big data ,boosting algorithms ,computer science

Opinions expressed by DZone contributors are their own.


Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.


{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}