Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Concept Learning: Find-S Implementation With Scala

DZone's Guide to

Concept Learning: Find-S Implementation With Scala

The Find-S algorithm is a basic algorithm of machine learning. Read on to see how to implement it with one of the most scalable and concurrent languages: Scala.

· AI Zone
Free Resource

Bring the power of Artificial Intelligence to IT Operations. Brought to you in partnership with BMC.

In our previous article, we discussed the basic theory of concept learning by highlighting the Find-S algorithm, which is one of the basic algorithms of machine learning. In this blog, we are going to discuss how we can implement the Find-S algorithm with one of the most scalable and concurrent programming languages: Scala.

In general, machine learning algorithms can be implemented in several ways. The basic objective of this article, though, is to provide an understanding of the Find-S algorithm. There are few things that are added to make algorithm usable. Similarly, there are few things that can be enhanced in terms of performance and implementation. Currently, the algorithm works on Scala Map to make features dynamic. One important thing that can be implemented here is the mapping part, which converts string values into integers or longs to improve performance. Let's go through the sample application that uses the Find-S algorithm to predict a simple scenario.

As we discussed in our previous blog, from the implementation perspective, there are three major parts of the Find-S algorithm:

  1. Training
  2. Validation
  3. Prediction/classification

To achieve this, we can divide the entire process into multiple modules:

  1. Learning: Contains classes/traits related to machine learning algorithms.
  2. Common: Contains common classes used by different modules.
  3. Persistence: Contains classes and traits for hypothesis persistence logic.
  4. Examples: Contains sample applications to test the algorithms.

Here are some of the important classes that are required to implement this algorithm. We have tried to make some them more generic to maintain a pattern for other concept learning algorithms as well. Here is a brief description of them:

  • Model: An abstraction of any concept learning algorithm.

  • Trainer: Responsible to make an algorithm learn the concept.

  • Examples: Sample applications to test the learning and prediction of an algorithm.

Apart from this, there are few helper classes to perform different operations for learning and prediction like reading, writing to, and writing from a file, as well as JSON parsers to parse JSON into Map[String, Any].

Trainer

The trainer is one of the most important parts of the Find-S algorithm. The basic responsibilities of this class are:

  • Accept/read training data to process

  • Distribute it into two parts:

    • a. Training data

    • b. Validation data

  • For any machine learning algorithm, it's a good practice to use sample data for training and validation of the trained model, as well. According to some theories, we should use more than 60% of data for training purpose and remaining data for validation. Some algorithms use up to 90% data for training and remaining for validation.

  • Pass data to Find-S model to be learned.

  • Validation of algorithm with validation data

Here is a trait that defines a Trainer:

trait Trainer {

  //Train the algorithm
  def train: Boolean

  //Read data from file
  protected def read: List[Map[String, Any]]

  //separate data into two files
  protected def separate(data: List[Map[String, Any]]): (List[Map[String, Any]], List[Map[String, Any]])

  //pass training data to algorithm
  protected def training(sample: List[Map[String, Any]]): Boolean

  //Validate final hypothesis
  protected def validate(validationData: List[Map[String, Any]]): Boolean

}

Model

We have tried to make a standard definition of a model for concept learning algorithms to make things easy to understand and implement in terms of machine learning:

/**
  * Model to be trained
  */
trait Model {
  val resKey: String
  def training(sample: scala.collection.immutable.Map[String, Any]): Boolean
  def getHypothesis: Any
  def predict(dataObject: scala.collection.immutable.Map[String, Any]): Boolean
  def persist: Boolean
  def trained: Boolean
}

Examples

Here is an example to find a target hypothesis using the Find-S algorithm. This example is divided into multiple parts, as follows.

Training Data Generation

As we know, concept learning works on past experiences — so we need to have training data ready for the learning process. This step involves training data generation with a simple example. To test the application, you can create your own data. The application currently generates test data as Map[String, Any].

Trainer Initialization

This task involves the creation of a trainer with a model (Find-S) and some basic configuration like training ratio (ratio between training samples and validation samples, typically represented by the double value in range 0 to 1 where 0 represents 0% and 1 represents 100%).

The trainer is completely responsible to make a model learn the concept from training samples, but we need to trigger that event using trainer function train. Once triggered, the trainer divides the training samples into two parts training data and validation data based on training ratio, and passes training samples to the model synchronously.

Trained Model

After finishing the training process, we can use trained models to make predictions and can analyze the final hypothesis (or hypothesis set) using the getHypothesis function.

Testing

To test the training model, we can pass a sample object into the model using the predict function and compare the actual output with expected output for verification.

/**
  * Find-S example
  */
object FindSExample extends App with LogHelper {

  /** ******************************
    * TRAINING DATA GENERATION
    * *******************************/
  val trainingDataFilePath = ConceptLearningTrainingDataGenerator
                             .randomTrainingData

  /** ******************************
    * TRAINER INITIALIZATION
    * *******************************/
  val path = "/tmp/find_s"
  val jsonHelper = new FileHelper {}.reset(path)
  val trainer = new FindSTrainer {
    val trainingSampleFilePath = trainingDataFilePath
    val model: Model = new FindS("result", path)
    override val trainingRatio = 1.0
  }

  /** ******************************
    * TRAINING
    * *******************************/
  if (!trainer.model.trained) {
    trainer.train
  } else {
    info("Model is trained, skipping training process")
  }

  /** ******************************
    * TRAINED MODEL
    * *******************************/
  val trainedModel = trainer.model

  info(s"***Hypothesis: ${trainedModel.getHypothesis}")

  /** ***********************************
    * TESTING
    * ***********************************/
  val testDataObject = Map("sky" -> "Sunny", "airtemp" -> "Cool",
    "humidity" -> "Warm", "wind" -> "Weak",
    "water" -> "Cool", "forecast" -> "Change")
  info(s"***Testing new positive data object: $testDataObject")
  val status = trainedModel.predict(testDataObject)
  if (status) {
    info("***THE DATA OBJECT IS ... : +POSITIVE")
  } else {
    info("***THE DATA OBJECT IS ... : -NEGATIVE")
  }
}

As we have some notion of the training data, the accuracy of the model highly depends on this training data. The standard Find-S algorithm does not ensure/force error detection in training data. But we are throwing a few exceptions if anything wrong goes during learning and prediction so that the user of the algorithm can understand if there is an error in the training data.

Running the Sample Application

To test the application and find the code, you can clone the Git repo here.

The application is designed to work on different featured and values dynamically so you can test the application using training data in JSON file for now.

The current implementation provides you to use the algorithm for your test applications. Here are few points regarding the sample test application:

  1. The application is capable of reading data from only the JSON file for now.

  2. We have to provide result key for identifying the results from the set of keys.

  3. The Find-S Algorithm is able to store the target concept into the file and can use it next time the algorithm is initialized.

  4. After cloning the Git repo, you can use the following command to run the sample application using the following command: sbt "project examples" run.

  5. To understand how the application is working, you can play with test cases using the following command: sbt test.

Limitations of the Sample Application

This article was created to show the concept behind the Find-S Algorithm.

  • There are few things that can be improved like the mapping of feature values from string to double.
  • To find a conjunctive concept, we just compare the value of features.
  • The application is Linux-based for now.
  • You can find the Find-S hypothesis inside the /tmp folder.

This article was first published on the Knoldus blog.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
ai ,scala ,algorithm ,machine learning ,tutorial ,find-s ,concurrency

Published at DZone with permission of Girish Bharti, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}