Over a million developers have joined DZone.

MachineX: KNN Algorithm Using KSAI

DZone 's Guide to

MachineX: KNN Algorithm Using KSAI

Check out this brief article, which goes over the KNN algorithm using KSAI. Also explore the code.

· AI Zone ·
Free Resource

This article was first published on the Knoldus blog.

Classification is a well-known area of Machine Learning. The K-Nearest neighbor algorithm is a simple algorithm that keeps all available cases and classifies new cases based on the similarity with existing cases. KNN has been used in pattern recognition as a non-parametric technique. In this algorithm, a case is classified by a majority of votes of its neighbors. If K=1, then the cases are assigned directly to the class of its immediate neighbor. Similarly, if K=2, then on the basis of two immediate neighbors, we can decide the class of the new cases.

Now the question arises, how can we find the nearest neighbors? The answer is by calculating the distance between the data points (cases). Actually, there are a lot of ways to do that. Here are some of the popular methods to find the distance between two data points:

  1. Euclidean distance
  2. Manhattan Distance
  3. Minkowski Distance

We can choose either of the method based on the use case. It is also important to know that all above distance measures are only for continuous variables.

The next important point is to choose an optimal value for K. This can be best done by analyzing the data. A large value is more precise as it reduces the noise, but there is no guarantee. Cross-validation is another way to determine the good K value by using an independent data set to validate the value of K. Usually, the value between 3-10 was found optimal for most data sets. In this article, we will be using KSAI, a Machine Learning library written in Scala for training our model and prediction based on that training.

How to Use KSAI for KNN Algorithm

KSAI is an open source Machine Learning library which contains various algorithms such as classification, regression, clustering and many others. It is an attempt to build Machine Learning algorithms with the language Scala. The library Breeze, which is again built on Scala, is getting used for doing the mathematical functionalities.

KSAI mainly used Scala's inbuilt case classes, Future and some of the other cool features. It has also used Akka in some places and tried doing things in an asynchronous fashion. In order to start exploring the library, the test cases might be a good start. Right now it might not be that easy to use the library with limited documentation and unclear API, however, the committers will update them in the near future.

Here is how we can use KSAI for KNN algorithm.

1. Adding library to project:

You can add this library to your project using the following lines:

libraryDependencies += "io.github.knolduslabs.ksai" %% "ksai" % "0.0.2"

Once the library is there on your project, you have to refresh your project after compiling it. Once compiled you should be able to access the required classes for KNN algorithm.

Here is a sample code block that uses KNN algorithm and also uses the same data set to validate the results:

val arffFile: String = getClass.getResource("/sampledata.arff").getPath
val arff: ARFF[String] = ARFFParser.parse(arffFile)

val data: Array[Array[Double]] = arff.data.toArray
val results: Array[Int] = arff.getNumericTargets.toArray

//KNN with K = 3
val knn3: KNN = KNN.learn(data, results, 3)
var error = 0
(0 until data.length).map{ i =>
  val result = knn3.predict(data(i))
  if(result != results(i)){
    error = error + 1
println("\n\nKNN with K = 3 ======>  ERROR: " + error)

Here the "sampledata.arff" is the data file in arff format. Using "ARFFParser.parse(arffFile)" you can parse the file and generate the data understood by the algorithm. Once the data is generated in Array[Array[Double]] form you can use this data to train your algorithm.

Using the following lines, you can train your algorithm:

val knn3: KNN = KNN.learn(data, results, 3)

Here, 3 is the value of K and you can choose it according to your requirements by checking the number of errors after validation.

You can tweak the data and value of K to find the perfect settings according to requirements.

To find more interesting algorithms from KSAI, please visit the following link:

KSAI: A Machine Learning library

I hope you enjoyed the post. Maybe in our next post, we will go deeper into the algorithm.

This article was first published on the Knoldus blog.

artificial intelligence ,machinex ,ksai ,knn algorithm ,ml library ,scala ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}