DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. Distributed MapReduce with Sharded MongoDB and SpringData

Distributed MapReduce with Sharded MongoDB and SpringData

Comsysto Gmbh user avatar by
Comsysto Gmbh
·
Dec. 12, 12 · Interview
Like (0)
Save
Tweet
Share
3.44K Views

Join the DZone community and get the full member experience.

Join For Free

in mongodb you have several options when it comes to aggregating data. there is the included javascript mapreduce framework, the new aggregation framework in v 2.2+ and also the hadoop connector for really heavy lifting.

when using the integrated mapreduce framework you have to be aware of the caveat that mongodb’s js interpreter is single-threaded . this means that regardless of how many cores your server has only one gets utilized. that’s a bit of a bummer because vertical scaling might not decrease execution times significantly. so to really bring down execution times and make use of mapreduce’s parallel computing abilities you have to scale out and shard. this brings mainly two advantages. for one you have now more “threads” and also the dataset that each of these has to deal with is getting smaller in relation of how many shards you have.

the latest of stable version of mongodb today is 2.0.6. which was initially used for our queries. using this version queries that we successfully issued against a single mongodb instance failed on the sharded setup.
it seems that we were hitting an issue similar or equal to https://jira.mongodb.org/browse/server-5536 .

as the issue states that it’s fixed in 2.1.2 we switched to the latest nightly build (2.1.2-pre) which worked fine.

unfortunately after switching to 2.1.2 we were confrontend with https://jira.springsource.org/browse/datamongo-378

the pragmatic albeit not beautiful was a hack that reimplements the following interfaces: mongooperations and applicationcontextaware. it basically works around the type cast to integer:

((number) counts.get(“input”)).intvalue(),
((number) counts.get(“emit”)).intvalue(),
((number) counts.get(“output”)).intvalue()

all of this resulted in the following performance improvements:

performance chart

these numbers show how much parallelism can actually result in a massive performance increase. mongodb brings excellent out-of-the-box capabilites to simplify sharding and replication.

MongoDB MapReduce

Published at DZone with permission of Comsysto Gmbh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Silver Bullet or False Panacea? 3 Questions for Data Contracts
  • The Changing Face of ETL
  • Connecting Your Devs' Work to the Business
  • Cloud-Native Application Networking

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: