DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Redis-Based Tomcat Session Management
  • Java for AI
  • Java Z Garbage Collector (ZGC): Revolutionizing Memory Management
  • Dynamic Data Processing Using Serverless Java With Quarkus on AWS Lambda by Enabling SnapStart (Part 2)

Trending

  • Code Reviews: Building an AI-Powered GitHub Integration
  • Agile’s Quarter-Century Crisis
  • Developers Beware: Slopsquatting and Vibe Coding Can Increase Risk of AI-Powered Attacks
  • Event Driven Architecture (EDA) - Optimizer or Complicator
  1. DZone
  2. Data Engineering
  3. Big Data
  4. How to Execute Distributed MapReduce on Java Over Data Stored in Redis

How to Execute Distributed MapReduce on Java Over Data Stored in Redis

What is MapReduce and why is it helpful?

By 
Nikita Koksharov user avatar
Nikita Koksharov
·
Nov. 21, 18 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
15.4K Views

Join the DZone community and get the full member experience.

Join For Free

MapReduce is a framework that programmers today can use to write applications that are able to process massive quantities of data using a modern distributed data processing approach. This processing approach is very popular among organizations today. As it allows the processing of data in parallel on large commodity hardware clusters, using MapReduce can significantly speed up data processing. In this post, we will take a look at how to execute MapReduce over data stored in Redis using Redisson - Redis based In-Memory Data Grid for Java.

What Is MapReduce?

MapReduce is a program model for distributed computing that could be implemented in Java. The algorithm contains two key tasks, which are known as Map and Reduce.

The purpose of the Map task is to convert a dataset into another dataset where elements are broken down into key/value pairs known as tuples. TheReduce task combines those data tuples into a small set of tuples, using the output from a map as its input.

The distributed computing means splitting a task into several separate processes, which can then be carried out in parallel on large commodity hardware clusters. Once MapReduce has broken down the various elements of the large dataset into tuples and then further reduced them into a smaller set, the data that remains can be processed in parallel, which can significantly speed up the processing that needs to be carried out on the data.

When Do You Need to Process Redis Data With MapReduce?

There are many situations in which it is helpful to use MapReduce to process Redis data. In general, what they all have in common is that the amount of data that you need to process is very large.

As a simple example, you can consider a situation in which you have monthly energy consumption figures for a very large number of organizations. Now imagine that you need to process this data to generate results such as the year of maximum usage, year of minimum usage, and so on, for each organization. While writing an algorithm to perform this kind of processing is not difficult for an experienced programmer, many such algorithms will take a long time to execute if they have to run over vast amounts of data.

As a solution to the problem of long processing times, you can use MapReduce to reduce the overall size of the dataset and, therefore, make it much faster to process. This reduction in processing time could be very important for many organizations, as it frees up the hardware so that it is available to use for other computing tasks.

There are many more examples of situations in which using distributed MapReduce on data that is stored in Redis using Redisson can be an extremely helpful thing to do. For example, using MapReduce can be particularly useful if you need to quickly, reliably, and accurately calculate a word count for a very large file or collection of files.

Examples of Executing Distributed MapReduce on Java Over Data Stored in Redis

Here is an example of how to use MapReduce to create an efficient algorithm that produces an accurate word count. This might seem like a very simple task, but using MapReduce is very important to cut down the processing time for very large blocks of text or large collections of files.

Take a look at the following code to see how this algorithm uses MapReduce provided by Redisson to take text data and process it to reliably produce an accurate word count.

Step 1

Create a Redisson config:

// from JSON

Config config = Config.fromJSON(...)

// from YAML

Config config = Config.fromYAML(...)

// or dynamically

Config config = new Config();

…


Step 2

Create a Redisson instance:

RedissonClient redisson = Redisson.create(config);


Step 3

Define the Mapper object. This is applied to each Map entry and split value by space to separate words:

public class WordMapper implements RMapper<String, String, String, Integer> {

    @Override
    public void map(String key, String value, RCollector<String, Integer> collector) {
            String[] words = value.split("[^a-zA-Z]");
            for (String word : words) {
                collector.emit(word, 1);
            }
        }
    }

}


Step 4

Define theReducer object. This calculates the total sum for each word.

public class WordReducer implements RReducer<String, Integer> {

     @Override
     public Integer reduce(String reducedKey, Iterator<Integer> iter) {
         int sum = 0;
         while (iter.hasNext()) {
            Integer i = (Integer) iter.next();
            sum += i;
         }
         return sum;
     }
}


Step 5

Define the Collator object (optional). This counts the total amount of words.

public class WordCollator implements RCollator<String, Integer, Integer> {

     @Override
     public Integer collate(Map<String, Integer> resultMap) {
        int result = 0;
        for (Integer count : resultMap.values()) {
            result += count;
        }

        return result;
     }
}


Step 6

Here is how to run it all together:

    RMap<String, String> map = redisson.getMap("wordsMap");
    map.put("line1", "Alice was beginning to get very tired");
    map.put("line2", "of sitting by her sister on the bank and");
    map.put("line3", "of having nothing to do once or twice she");
    map.put("line4", "had peeped into the book her sister was reading");
    map.put("line5", "but it had no pictures or conversations in it");
    map.put("line6", "and what is the use of a book");
    map.put("line7", "thought Alice without pictures or conversation");

    RMapReduce<String, String, String, Integer> mapReduce
             = map.<String, Integer>mapReduce()
                  .mapper(new WordMapper())
                  .reducer(new WordReducer());

    // count occurrences of words
    Map<String, Integer> mapToNumber = mapReduce.execute();

    // count total words amount
    Integer totalWordsAmount = mapReduce.execute(new WordCollator());


 MapReduce is also available for collection type objects, including Set, SetCache, List, SortedSet, ScoredSortedSet, Queue, BlockingQueue, Deque,, BlockingDeque, PriorityQueue, and PriorityDeque.

How to Use Redisson to Execute MapReduce on Java Over Data Stored in Redis

Redisson is a state-of-the-art Redis client that opens up near infinite possibilities for programming and data processing using Java. A wide range of companies, from the biggest enterprises to the smallest startups, use Redisson to empower their Java applications with Redis.

As a highly sophisticated Redis client, Redisson provides a distributed implementation of services, objects, collections, locks, and synchronizers. It supports a range of Redis configurations, including a single, cluster, sentinel, or master and slave configuration.

Using MapReduce is an excellent option if you are already using Redisson to store large amounts of data in Redis. Redisson provides a Java-based MapReduce programming model that can make it easy to process a very large amount of data stored in Redis.

Data processing Redis (company) Java (programming language) MapReduce hadoop

Opinions expressed by DZone contributors are their own.

Related

  • Redis-Based Tomcat Session Management
  • Java for AI
  • Java Z Garbage Collector (ZGC): Revolutionizing Memory Management
  • Dynamic Data Processing Using Serverless Java With Quarkus on AWS Lambda by Enabling SnapStart (Part 2)

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!