Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Just In Time MapReduce with OSGi

DZone's Guide to

Just In Time MapReduce with OSGi

· Java Zone ·
Free Resource

Verify, standardize, and correct the Big 4 + more– name, email, phone and global addresses – try our Data Quality APIs now at Melissa Developer Portal!

A couple of years ago I was thrown into a team lead role where I was responsible for distributing workload across a number of developers. During the first few months I found that as the workload increased so did my issues whilst trying to increase the teams scale. It took me a few more months to figure out that my team leading technique needed to be turned upside down.

Instead of pushing the work out, I found it was much better for me to set up work queues and have the developers pick up tasks from the queue. With this very minor change in technique I was able to double the team’s size without taking stress leave.

I’ve taken this concept and attempted to implement it on OSGi whilst borrowing heavily from the world famous MapReduce research made available by the ingeniously generous folk over at Google.

If you’d like to get my demo running or want to know what the most popular starting letter is on your favorite web pages, you can always follow the steps below. The steps should take around 5 – 10 minutes to complete.

Pre Requisites

1) Install eclipse 3.4 (http://www.eclipse.org/)

2) Install the Rich Client Platform (http://www.eclipse.org/rap/gettingstarted.php)

Running the Demo

Download the mapreduce.zip example source code from:

https://sourceforge.net/project/showfiles.php?group_id=228168&package_id=303652

Open 2 instances of eclipse and create 2 new workspaces, eg. Node1 and Node2.

In both instances configure RAP as the target platform.

Click window -> preferences -> Plug in development -> target platform -> Select the RAP target platform location which is located at [ECLIPSE_HOME]/ configuration/org.eclipse.rap.target-1.1.1/eclipse

Import the demo code into both instances of eclipse.

Click file -> Import -> Existing Projects into workspace -> Select archive file -> click browse and select the mapreduce.zip file which you downloaded earlier.

In both instances of eclipse, Expand galang.research.rap.hello -> double click plugin.xml then click launch RAP application.

In any eclipse instance click Add URL content to memory.

To add a few more pages, Paste the following URL’s and click Add URL content to memory after each one.

http://cnn.com

http://slashdot.com

http://engadget.com

http://smh.com.au

Once youve added the above urls, click run map reduce. If you see both instances of eclipse showing the map reduce output, You’ve setup the demo as expected.

What’s going on under the covers?

A wise/lazy man once said a picture says a thousand words so below is a sequence diagram of what is going on under the hood. You can also step through the code if diagrams aren’t your thing.

Can you use this algorithm for anything else?

I think there are more uses for this algorithm this than just counting the starting characters of words on web pages. You could use this to run a lot of sql queries in parallel then reduce the output. If you want to do more with this then I’d recommend playing around with MyMapper.java and MyReducer.java.

I’d love to throw this code on a large number of nodes to see how quick I can get it run with gigabytes of data. If you’re a big iron or grid/cloud vendor with a few hundred nodes to spare you are more than welcome to drop me a line :) (glenn.galang@gmail.com).

 From:

http://ggalangblog.blogspot.com/

Developers! Quickly and easily gain access to the tools and information you need! Explore, test and combine our data quality APIs at Melissa Developer Portal – home to tools that save time and boost revenue. Our APIs verify, standardize, and correct the Big 4 + more – name, email, phone and global addresses – to ensure accurate delivery, prevent blacklisting and identify risks in real-time.

Topics:

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}