Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Regression Analysis Is Easy With Scala and Smile

DZone's Guide to

Regression Analysis Is Easy With Scala and Smile

Smile is a statistical machine intelligence and learning engine that makes regression analysis easier. Learn exactly how it works!

· AI Zone
Free Resource

Find out how AI-Fueled APIs from Neura can make interesting products more exciting and engaging. 

When we think about regression in machine learning, what usually comes to mind are these two techniques: linear and logistic regressions. These forms of regression are considered to be the most popular.

But the truth is, there are many other regression techniques, and all have a valid use in machine learning depending on the situation.

In this blog, we are going discuss regression analysis, its frequently used types, and their implementation used in SMILE.

What Is Regression Analysis?

Regression analysis is a type of predictive modeling technique that is used for estimating relationships between one dependent variable (called target) and one or more than one independent variables (called predictors). Here, unlike the classification technique, the output variable takes continuous values in regression analysis.

The analysis includes understanding how the value of the target typically changes when one of the predictors are varied while keeping the values of other predictors fixed.

The two main benefits that regression analysis provides are that:

  1. It provides the relevant relationships between targets and predictors.
  2. It provides the strength of impact of multiple predictors on a target.

Types of Regression

There are various types of regression techniques but most are based on the following three metrics:

  1. The number of predictors.
  2. Type of target.
  3. The shape of the regression line.

Based on these metrics, following are the most frequently used techniques:

  1. Linear regression: One of the most widely used techniques. In this technique, the target is continuous. Predictors can be continuous or discrete and the nature of the regression line is linear.
  2. Logistic regression: Finds the probability of an event (success or failure). We should use this technique when the target is in binary form, i.e. 0/1, true/false, etc.
  3. Polynomial regression: Used when the power of predictors is more than 1.
  4. Ridge regression: Used when there is a need to alleviate multicollinearity among the predictors. When highly correlated predictors are there, the regression coefficient of any one predictor depends on which other predictors are included in the model, and which ones are excluded. Ridge regression adds a small bias factor to the variables in order to curb this problem.
  5. Lasso regression: Its aim is similar to ridge regression, but ridge regression can’t zero-out regression coefficients. Thus, you either end up including all the coefficients in the model or none of them. This is achieved by using absolute values in the penalty function, instead of squares. Hence, in contrast to Ridge, lasso does both parameter shrinkage and variable selection automatically. It is capable of reducing the variability and improving the accuracy of linear regression models.
  6. Elastic net regression: It is a hybrid of lasso and ridge regression. Elastic net is a regularized regression method that linearly combines L1 and L2 penalties of the lasso and ridge methods. Elastic net is useful when there are multiple predictors which are highly correlated.

Fascinating, isn’t it? I bet a lot of you want to implement it. Scala lovers, this is for you!

So, smile... because Smile is here! Smile’s regression algorithms are in the package smile.regression and all algorithms implement the interface Regression that has a single method predict to apply the model to an instance.

Now, let’s talk Scala.

SBT dependency to be added:

libraryDependencies += "com.github.haifengl" % "smile-scala_2.12" % "1.3.1"

There is a trait smile.regression.Operators that provides methods for all the type of regression techniques. Some of the methods are:

def ols(x: Array[Array[Double]], y: Array[Double], method: String = "qr"): OLS
def ridge(x: Array[Array[Double]], y: Array[Double], lambda: Double): RidgeRegression
def lasso(x: Array[Array[Double]], y: Array[Double], lambda: Double, tol: Double = 1E-3, maxIter: Int = 5000): LASSO

It’s really simple to use these methods. Here is a code snippet to showcase the easy use of Smile.

import smile.regression.Operators

object SmileExample extends App with Operators {
   val x = Array(
    Array(234.289,      235.6,        159.0,    107.608, 1947,   60.323),
    Array(259.426,      232.5,        145.6,    108.632, 1948,   61.122),
    Array(258.054,      368.2,        161.6,    109.773, 1949,   60.171),
    Array(284.599,      335.1,        165.0,    110.929, 1950,   61.187),
    Array(328.975,      209.9,        309.9,    112.075, 1951,   63.221),
    Array(346.999,      193.2,        359.4,    113.270, 1952,   63.639),
    Array(365.385,      187.0,        354.7,    115.094, 1953,   64.989),
    Array(363.112,      357.8,        335.0,    116.219, 1954,   63.761),
    Array(397.469,      290.4,        304.8,    117.388, 1955,   66.019),
    Array(419.180,      282.2,        285.7,    118.734, 1956,   67.857),
    Array(442.769,      293.6,        279.8,    120.445, 1957,   68.169),
    Array(444.546,      468.1,        263.7,    121.950, 1958,   66.513),
    Array(482.704,      381.3,        255.2,    123.366, 1959,   68.655),
    Array(502.601,      393.1,        251.4,    125.368, 1960,   69.564),
    Array(518.173,      480.6,        257.2,    127.852, 1961,   69.331),
    Array(554.894,      400.7,        282.7,    130.081, 1962,   70.551))

  val y = Array(83.0,  88.5,  88.2,  89.5,  96.2,  98.1,  99.0, 100.0, 101.2,
    104.6, 108.4, 110.8, 112.6, 114.2, 115.7, 116.9)

  val maxIterations = 1000
  val tolerance =1E-3
  println(ridge(x, y, 0.0057))
  println(lasso(x, y, 0.0057, tolerance, maxIterations))
}

Here's an explanation:

  1. x is the explanatory variables.
  2. y is response values.
  3. 0.0057 is the regularization parameter.
  4. tolerance is tolerance for stopping iterations.
  5. maxIterations is the maximum number of iterations.

And with this, we finish our dive into the regression technique for machine learning using Scala. I hope this encourages you to explore more into the use of Smile library for machinelearning in Scala. 

References

To find out how AI-Fueled APIs can increase engagement and retention, download Six Ways to Boost Engagement for Your IoT Device or App with AI today.

Topics:
ai ,regression ,predictive analytics ,scala ,tutorial

Published at DZone with permission of Anmol Mehta. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}