DZone
Java Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Java Zone > Collocation: The First Rule of Distributed Programming

Collocation: The First Rule of Distributed Programming

Dmitriy Setrakyan user avatar by
Dmitriy Setrakyan
·
Jan. 17, 11 · Java Zone · Interview
Like (0)
Save
Tweet
6.21K Views

Join the DZone community and get the full member experience.

Join For Free

I am pleased to announce that we just recently released GridGain 3.0.4. The last couple of releases have been focused, among other things, around convenient and effective collocation of computations and data, and also grouping of data that is usually accessed together on the same nodes. Sending computations exactly to the nodes where the accessed data is residing is one of the key components in achieving better scalability. Without collocation, nodes fetch various data from other nodes for brief periods of time, just to perform often a quick computation and discard it almost immediately thereafter. This creates unnecessary data traffic, a.k.a. data noise, and can at times bring a server to its knees.

In my previous blog post I showed how to collocate computations and data using direct API via GridCache.mapKeyToNode(..) method. We have also added analogous methods on Grid API to provide capability of finding data affinity on the nodes that do not cache any data themselves. In our latest 3.0.4 release we have also added a very convenient way to provide collocation via @GridCacheAffinityMapped annotation.

Say you have 2 types of objects, Person and Company. Multiple persons can work for the same company. This means that you generally may wish to access Person objects together with the Company for which they work. To do that in a scalable fashion, you may wish to ensure that all people working for the same company are cached on the same node. This way you can send computations to that node and access multiple people from the same company locally. Here is how it can be done in GridGain.

    public class PersonKey {  
// Person ID used to identify a person.
private String personId;

// Company ID which will be used for data affinity.
@GridCacheAffinityMapped
private String companyId;
...
}
...
// Instantiate person keys with same company ID.
Object personKey1 = new PersonKey("myPersonId1", "myCompanyId");
Object personKey2 = new PersonKey("myPersonId2", "myCompanyId");

// Both, the company and the person objects will be cached on the same node.
cache.put("myCompanyId", new Company(..));
cache.put(personKey1, new Person(..));
cache.put(personKey2, new Person(..));
Now, if you want to perform a computation which involves multiple people working for the same company, all you have to do is send a grid job to the node where those people are cached. Here is how you would send a computation to the node which caches all people for the company with ID "myCompanyId".

G.grid().run(GridClosureCallMode.BALANCE, new Runnable() {
// This annotation specifies that computation should be routed
// precisely to the node where all objects with affinity key
// of 'myCompanyId' are cached.
@GridCacheAffinityMapped
private String companyId = "myCompanyId";

@Override public void run() {
// Some computation logic here.
...
}
};
Now, when you properly collocate all your data within your data grid and then route your computations to the nodes where your data is cached, all cache operations become LOCAL, hence achieving best performance and scalability without any data noise. Kind of goes inline with the first rule of distributed programming, which is DO NOT DISTRIBUTE.

 

From http://gridgain.blogspot.com/2011/01/collocation-first-rule-of-distributed.html

Collocation (remote sensing) Data (computing)

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Determine if Microservices Architecture Is Right for Your Business
  • Debugging Deadlocks and Race Conditions
  • Datafaker: An Alternative to Using Production Data
  • Ultra-Fast Microservices: When Microstream Meets Payara

Comments

Java Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo