Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

# Convex Regression Model

### In this post, a data scientist walks us through a bit of complex math and the corresponding R code that we need to make our models.

· Big Data Zone ·
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

This morning during the lecture on nonlinear regression, I mentioned (very) briefly the case of convex regression. Since I forgot to mention the codes in R, I will publish them here. Assume that yi=m(xi)+εi where m:RdR is some convex function.

Then m is convex if and only if x1,x2∈Rdt∈[0,1],

Hidreth (1954) proved that if

then θ⋆=(m⋆(x1),⋯,m⋆(xn)) is unique.

Let y=θ+ε, then

where

I.e. θ is the projection of \mathbf{y}y onto the (closed) convex cone \mathcal{K}K. The projection theorem gives existence and unicity.

For convenience, in the application, we will consider the real-valued case, m:RR, i.e. yi=m(xi)+εi. Assume that observations are ordered x1≤x2≤⋯≤xn. Here

Hence, quadratic program with n−2 linear constraints.

m is a piecewise linear function (interpolation of consecutive pairs (xi,θi⋆)).

If m is differentiable, m is convex if

More generally, if m is convex, then there exists ξxRn such that

ξx is a subgradient of m at x. And then

Hence, θ is solution of

and ξ1,⋯,ξnRn. Now, to do it for real, use cobs package for constrained (b)splines regression,

library(cobs)

To get a convex regression, use

plot(cars)
x = cars$speed y = cars$dist
rc = conreg(x,y,convex=TRUE)
lines(rc, col = 2)

Here we can get the values of the knots

rc

Call:  conreg(x = x, y = y, convex = TRUE)
Convex regression: From 19 separated x-values, using 5 inner knots,
7,    8,    9,   20,   23.
RSS =  1356; R^2 = 0.8766;
needed (5,0) iterations

and actually, if we use them in a linear-spline regression, we get the same output here

reg = lm(dist~bs(speed,degree=1,knots=c(4,7,8,9,,20,23,25)),data=cars)
u = seq(4,25,by=.1)
v = predict(reg,newdata=data.frame(speed=u))
lines(u,v,col="green")

Let us add vertical lines for the knots

abline(v=c(4,7,8,9,20,23,25),col="grey",lty=2)

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,convex regression model ,data modeling

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Opinions expressed by DZone contributors are their own.