How to Regularize Intercept in GLM
Learn about the parameters that can help you in the regularization of the H2O GLM model.
Join the DZone community and get the full member experience.
Join For FreeSometimes, you may want to emulate hierarchical modeling to achieve your objective. To do this, you can use beta_constraints
as below:
iris = h2o.import_file("http://h2o-public-test-data.s3.amazonaws.com/smalldata/iris/iris_wheader.csv")
bc = h2o.H2OFrame([("Intercept",-1000,1000,3,30)], column_names=["names","lower_bounds","upper_bounds","beta_given","rho"])
glm = H2OGeneralizedLinearEstimator(family = "gaussian",
beta_constraints=bc,
standardize=False)
glm.coef()
The output will look like this:
{u'Intercept': 3.000933645168297,
u'class.Iris-setosa': 0.0,
u'class.Iris-versicolor': 0.0,
u'class.Iris-virginica': 0.0,
u'petal_len': 0.4423526962303227,
u'petal_wid': 0.0,
u'sepal_wid': 0.37712042938039897}
There’s more information in this GLM booklet, but the short version is to create new constraints frame with the columns: names, lower_bounds, upper_bounds, beta_given, and rho, and have a row entry per feature you want to constrain. You can use “Intercept” as a keyword to constrain the intercept.
names: (mandatory) coefficient names
lower bounds: (optional) coefficient lower bounds , must be less than or equal to upper bounds
upper bounds: (optional) coefficient upper bounds , must be greater than or equal to lower bounds beta given: (optional) specifies the given solution in proximal operator interface
rho (mandatory if beta given is specified, otherwise ignored): specifies per-column L2 penalties on the distance from the given solution
What’s happening is an L2 penalty is being applied to the coefficient and the given. The proximal penalty is computed: Σ(x-x’)*rho. You can confirm this by setting rho to be whatever lambda may be and set let lambda to 0. This will give the same result as having set lambda to that value. You can use beta constraints to assign per-feature regularization strength but only for the L2 penalty. The math is explained here:
sum_i rho[i] * L2norm2(beta[i]-betagiven[i])
So if you set beta to zero, and say all rho except for the intercept to 1e-5, then it is equivalent to running without BC — just with alpha = 0, lambda = 1e-5.
Published at DZone with permission of Avkash Chauhan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments