DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Last call! Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • AI: The Future of HealthTech
  • AI's Dilemma: When to Retrain and When to Unlearn?
  • A Systematic Approach for Java Software Upgrades
  • Deep Learning Fraud Detection With AWS SageMaker and Glue

Trending

  • 5 Subtle Indicators Your Development Environment Is Under Siege
  • Build Your First AI Model in Python: A Beginner's Guide (1 of 3)
  • Teradata Performance and Skew Prevention Tips
  • Java 23 Features: A Deep Dive Into the Newest Enhancements
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Predictive Services Development With Java and Weka

Predictive Services Development With Java and Weka

In this article, we will briefly describe how to create predictive services with Java and Weka library (Waikato Environment for Knowledge Analysis, Waikato University).

By 
Aleksei Chaika user avatar
Aleksei Chaika
·
May. 03, 23 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
6.6K Views

Join the DZone community and get the full member experience.

Join For Free

In this article, we will briefly describe how to create predictive services with Java and Weka library (Waikato Environment for Knowledge Analysis, The University of Waikato).

We will develop a Java application and consider a real-life example — the prediction of real estate prices in Seattle depending on parameters (area, distance from center, bedrooms). The application will learn real estate prices, and we will analyze its predictions.

The city of Seattle was chosen randomly for this article.

Brief Introduction to Machine Learning in Java

If you never heard about machine learning, or you're simply afraid of it, please don't  :)

Machine learning is a rapidly growing field of artificial intelligence (AI) that involves the development of algorithms and models that enable computers to learn and make predictions or decisions based on data. Machine learning could be used for image and object recognition, natural language processing (NLP), customer relationship management, recommendation systems, predictive maintenance, and much more.

Indeed, it is not that easy to develop everything from scratch, but Java provides a rich ecosystem of libraries and tools that can be leveraged for machine learning tasks, such as data preprocessing, feature engineering, model training, and evaluation. 

Just some examples of tools that help developers to implement machine learning algorithms and build predictive models — Deeplearning4j, MLlib (Apache Spark), Smile (Statistical Machine Intelligence and Learning Engine), DL4J (Deep Learning for Java), Encog (Embedded Neural Network and Genetic Programming Framework), JOONE (Java Object Oriented Neural Engine), Weka (Waikato Environment for Knowledge Analysis).

Development of Predictive Application

In this article, we will develop an application responsible for predicting real estate prices in Seattle depending on parameters (area, distance from the center, and bedrooms).

We will use a machine learning library called Weka (Waikato Environment for Knowledge Analysis, The University of Waikato), which provides a wide range of machine learning algorithms and tools for data mining, feature selection, and model evaluation. Weka is an open-source software that is widely used in both academic and industry settings for a variety of machine-learning tasks.

1. Application Dependencies

First, in order to start using Weka in the Java application, we have to add it to the classpath. You can find the required artifacts in the Maven repository. Just add it to application properties:

  • using Maven
XML
 
<dependencies>
    <!-- https://mvnrepository.com/artifact/nz.ac.waikato.cms.weka/weka-stable -->
    <dependency>
        <groupId>nz.ac.waikato.cms.weka</groupId>
        <artifactId>weka-stable</artifactId>
        <version>3.8.6</version>
    </dependency>
</dependencies>


  • using Gradle
Groovy
 
// https://mvnrepository.com/artifact/nz.ac.waikato.cms.weka/weka-stable
implementation group: 'nz.ac.waikato.cms.weka', name: 'weka-stable', version: '3.8.6'


2. Initial Dataset

Machine learning applications are mostly designed to autonomously learn from data and improve their performance over time.

But in this article, we will simply provide an application with the initial dataset, i.e. some real estate prices in Seattle. To do that, let's create an arff file and collect some data there.

% All comments have to start with '%' character

% Relation name
@relation prices

% Fields order
@attribute area numeric     % area, sqft
@attribute Bedrooms numeric % bedrooms, bd
@attribute distance numeric % distance to the center, mi
@attribute price numeric    % price, $

% Declaring examples
@data
% Area, sqft    Bedrooms    Distance, mi    Price, $
3150            4           1.2             1349000
2290            3           1.2             1050000
2940            6           3.0             1850000
1107            2           3.2             729950
1122            2           6.6             599950
1350            2           6.9             598000
1390            3           12.2            449950
1660            5           12.3            619950
1540            4           12.9            1024900

Such a file could be simply placed in the application resources folder or in any other place.

There are just several records. This is enough for this article, but for real-world applications, it isn't. In order to achieve high accuracy, much more data should be provided, automated learning techniques should be used, and data should be collected continuously. 

However, it's important to carefully evaluate and validate the results obtained from automated processes to ensure the reliability and accuracy of the machine learning models in real-world applications.

3. Java Code

First, we have to teach our application. In this article, we can do it by passing an arff file with the initial dataset to it. Let's retrieve this data and set a class index to a price:

Java
 
// Loading Seattle real estate prices from arff file
ConverterUtils.DataSource source = new ConverterUtils.DataSource("prices.arff");
Instances data = source.getDataSet();
// Setting the last attribute (price) to the class index
data.setClassIndex(data.numAttributes() - 1);


Second, we have to create a classifier. Let's use a linear regression classifier in this article:

Java
 
// Creating a linear regression based classifier
Classifier classifier = new LinearRegression();

// Let's learn classifier with data
classifier.buildClassifier(data);

// Creating an Instance for predictions
Instance instance = new DenseInstance(data.numAttributes());
instance.setDataset(data);


And now, we can write a code for a price prediction depending on the area, bedrooms, and distance to the center:

Java
 
public void predictPrice(double area, int bedrooms, double milesAway) throws Exception {
    // Let's ask for a price for the property:
    instance.setValue(0, area);
    instance.setValue(1, bedrooms);
    instance.setValue(2, milesAway);

    // Price predicting action
    double predictedPrice = classifier.classifyInstance(instance);
    System.out.println("Predicted price: " + predictedPrice);

    // Calculation error rate
    Evaluation eval = new Evaluation(data);
    eval.evaluateModel(classifier, data);
    System.out.println("Calculation error rate: " + eval.errorRate());
}


The code mentioned in this article can be found on GitHub.

4. Running Application

So basically, we're ready to launch an application. 

It consumes a file with an initial dataset in order to learn prices in Seattle, linear regression is used, and it is ready to predict prices depending on parameters (area, distance from center, bedrooms).

The application is asked to predict prices for several objects, and the output is:

-- Predicting price for [area - 2000.0 sqft, bedrooms - 4, miles away - 1.0 mi]
Predicted price: 1367914.915677933
...
-- Predicting price for [area - 2000.0 sqft, bedrooms - 4, miles away - 10.0 mi]
Predicted price: 857712.4223102874
Calculation error rate: 167943.05185960032


Analyzing Results

Prediction results are collected in the table below:

Input OUPUT
Area, sqft Bedrooms, bd Miles away, mi Predicted price, $
2000 4 1 1 367 915
2000 4 2 1 311 226
2000 4 3 1 254 537
1000 3 5 905 812
1500 2 7 557 087
2000 4 10 857 712

The predicted prices are very close to those in the real estate market in Seattle. So application predictions are pretty accurate.

The application determined that the calculation error rate is 167 943 $ (12-20%), and it also seems to be a good result. 

But in some cases, predictions may differ from real prices much more. We didn't provide an application with enough data, and there is no direct relationship between price and passed parameters (area, bedrooms, and distance to the center). The price estimation process is much more complicated; we didn't consider properties like distance to a school and its rating, neighbors, condition of the real estate property, and so on and so forth.

In addition to that, the application didn't learn prices far away from Seattle at all. The farthest object from the center in the dataset is located 12.9 miles away. Let's take a look at predictions for real estate properties located 25 and 30 miles away from Seattle:

Input OUPUT
Area, sqft Bedrooms, bd Miles away, mi Predicted price, $
2500 4 25 7 375
2500 4 30
-276 070

We provided the application with such a small and non-diverse dataset that the application has learned that there is no life outside of Seattle at all. It is crucial to have accurate and relevant data when it comes to learning ML applications.

But results within Seattle itself are still accurate and correspond to the real estate market.

Conclusion

In conclusion, using Java and libraries like Weka offers a powerful and flexible approach to developing machine learning applications.

In this article, we developed an application that predicts prices in Seattle pretty accurately, and they correspond to the real estate market. But the application didn't learn about the prices outside of Seattle, and therefore results, in this case, were wrong.

That's why it is important to highlight that successful machine learning applications require careful consideration of data quality, model selection (we didn't consider schools rating, a condition of a property), and model evaluation to ensure reliable and accurate results. 

The code mentioned in this article can be found on GitHub.

Deep learning Machine learning application Java (programming language)

Opinions expressed by DZone contributors are their own.

Related

  • AI: The Future of HealthTech
  • AI's Dilemma: When to Retrain and When to Unlearn?
  • A Systematic Approach for Java Software Upgrades
  • Deep Learning Fraud Detection With AWS SageMaker and Glue

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!