Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

MachineX: Naive Bayes Classifier With KSAI

DZone's Guide to

MachineX: Naive Bayes Classifier With KSAI

Today we will be using the KSAI library to build our Naive Bayes model. Also explore what KSAI is and how it works.

· AI Zone ·
Free Resource

Insight for I&O leaders on deploying AIOps platforms to enhance performance monitoring today. Read the Guide.

This article was first published on the Knoldus blog.

Naive Bayes is a simple technique for constructing classifiers: models that assign class labels to problem instances, represented as vectors of feature values, where the class labels are drawn from some finite set. Naive Bayes Classifier is a straightforward and powerful algorithm for the classification task. Even if we are working on a data set with millions of records with some attributes, it is suggested to try the Naive Bayes approach.

In many of my previous articles, I have posted about Naive Bayes Classifier, what it's about, and how it works. Today we will be using KSAI library to build our Naive Bayes model. But before that let's explore What is KSAI?

What Is KSAI?

KSAI is an open-source Machine Learning library that contains various algorithms such as classification, regression, clustering, and many others. It is an attempt to build Machine Learning algorithms with the language Scala. The library Breeze, which is again built on Scala is getting used for doing the mathematical functionalities.

KSAI mainly used Scala's inbuilt case classes, Future and some of the other cool features. It has also used Akka in some places and tried doing things in an asynchronous fashion. In order to start exploring the library, the test cases might be a good start. Right now it might not be that easy to use the library with limited documentation and unclear API, however, the committers will update them in the near future.

How to Use It

You can add KSAI library to your project by adding up below dependency for it.

  1. For sbt project add the following dependency to build.sbt.
    libraryDependencies += "io.github.knolduslabs" %% "ksai" % "0.0.4"
  2. For the Maven project, use the following dependency in pom.xml.
    <dependency>
        <groupId>io.github.knolduslabs</groupId>
        <artifactId>ksai_2.12</artifactId>
        <version>0.0.4</version>
    </dependency>
  3. For Gradle Groovy DSL, Gradle Kotlin DSL, Apache Ivy, Groovy Grape, and other build tools, you may find related dependencies here.

Using KSAI for Naive Bayes Classifier

KSAI Naive Bayes Classifier can be used under three models, which are:

  • General/Gaussian: It is used in classification and it assumes that features follow a normal distribution.
  • Multinomial: It is used for discrete counts.
  • Bernoulli: The binomial model is useful if your feature vectors are binary (i.e. zeros and ones).

For better understanding, let's take an example and try to build something using KSAI's Naive Bayes Classifier algorithm.

In this example, I'll be using the data file movie.txt for demonstrating the application. I will also be including this file in the GitHub repository that will be provided below so that you guys can also play around with it.

As the name suggests, movie.txt contains data related to movies, which is labeled as neg (for negative movie review) and pos (for positive movie review). In our example, we will first convert our resource data into numeric values, by providing 1 for each positive record and 0 for every negative record.

val resource = Source.fromFile("src/test/resources/movie.txt").getLines().toArray

val movieX = new Array[Array[Double]](2000)

val movieY = new Array[Int](2000)

val x = new Array[Array[String]](2000)
resource.indices.foreach { itr =>
  val value = resource(itr)
  val words = value.trim.split(" ")
  if (words(0).equalsIgnoreCase("pos")) {
    movieY(itr) = 1
  } else if (words(0).equalsIgnoreCase("neg")) {
    movieY(itr) = 0
  } else println("Invalid class label: " + words(itr))
  x(itr) = words
}

In our example, we will be using some of the feature set, according to which our algorithm will predict, and those features are:

val feature: Array[String] = Array(
  "outstanding", "wonderfully", "wasted", "lame", "awful", "poorly",
  "ridiculous", "waste", "worst", "bland", "unfunny", "stupid", "dull",
  "fantastic", "laughable", "mess", "pointless", "terrific", "memorable",
  "superb", "boring", "badly", "subtle", "terrible", "excellent",
  "perfectly", "masterpiece", "realistic", "flaws")

Now, according to these features, we will convert our whole data into numeric values so that we can apply the Naive Bayes Classifier to them.

val (featureMap, _) = feature.foldLeft((Map.empty[String, Int], 0)) {
  case ((map, k), string) if !map.keySet.contains(string) => (map ++ Map(string -> k), k + 1)
  case (tuple, _) => tuple
}

x.indices.foreach { itr =>
  movieX(itr) = feature(x(itr))
}


def feature(x: Array[String]): Array[Double] = {
  val bag = new Array[Double](feature.length)
  x.foreach { word =>
    featureMap.get(word).foreach { f => bag(f) = bag(f) + 1 }
  }
  bag
}

Our data set is ready. Now we will just slice up our data and use some part of source data to first train our algorithm and some part of source data to predict and check what is the accuracy of our algorithm.

val startTime = new java.util.Date().getTime// For logging time only
val crossValidation = CrossValidation(movieX.length, 10)
(0 until 10).foreach { itr =>
  val trainX = LOOCV.slice(movieX, crossValidation.train(itr)).toArray
  val trainY = LOOCV.slice(movieY, crossValidation.train(itr)).toArray

  val naiveBayes = NaiveBayes(model = MULTINOMIAL, classCount = 2, independentVariablesCount = feature.length)
  naiveBayes.learn(trainX, trainY)

  val testX = LOOCV.slice(movieX, crossValidation.test(itr)).toArray
  val testY = LOOCV.slice(movieY, crossValidation.test(itr)).toArray

  testX.indices.foreach { j =>
    val label = naiveBayes.predict(testX(j))
    if (label != -1) {
      total = total + 1
      if (testY(j) != label) {
        error = error + 1
      }
      else{
        success = success + 1
      }
    }
  }
}

info(s"Time taken: ${new java.util.Date().getTime - startTime} millies")
info(s"Multinomial error is $error and success is $success of total $total")

Here is the link to the sample code. Please explore it for more details.

That's it for this article. You can find many more interesting algorithms in KSAI right here.

Thanks for reading!

This article was first published on the Knoldus blog.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
artificial intelligence ,ksai ,naive bayes classifier ,machinex ,machine learning library ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}