DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
11 Monitoring and Observability Tools for 2023
Learn more
  1. DZone
  2. Data Engineering
  3. Databases
  4. Blind Sharding with RavenDB

Blind Sharding with RavenDB

Oren Eini user avatar by
Oren Eini
·
Mar. 20, 12 · Interview
Like (0)
Save
Tweet
Share
4.62K Views

Join the DZone community and get the full member experience.

Join For Free

from the get go, ravendb was designed with sharding in mind. we had a sharding client inside ravendb when we shipped, and it made for some really cool demos.

it also wasn’t really popular, we didn’t implement some things for sharding. we always intended to, but we had other things to do and no one was asking for it much.

that was strange. i decided that we needed to do two major things.

  • first, to make sure that the experience for writing in a sharded environment was as close as we could get to the one you get with a non sharded environment.
  • second, we had to make it simple to use sharding.

before our changes, in order to use sharding you had to do the following:

  • setup multiple ravendb server.
  • create a list of those servers urls.
  • implement ishardstrategy, which exposes
    • ishardaccesstrategy – determine how we call to the servers.
    • ishardselectionstrategy – determine how we select which server a new instance will go to, and what server an existing instance belongs on.
    • ishardresolutionstrategy – determine which servers we should query when we are querying for data (allow to optimize which servers we are actually hitting for particular queries)

all in all, you would need to write a minimum of 3 classes, and have to write some sharding code that can be… somewhat tricky.

oh, it works, and it is a great design. it is also complex , and it makes it harder to use sharding.

instead, we now have the following scenario:

image

as you can see, here we have three different servers, each running in a different port. let us see what we need to do to get us working with this from the client code:

image

first, we need to define the servers (and their names), then we create a shard strategy and use that to create a sharded document store. once that is done, we are home free, and can do pretty much whatever we want:

string asianid, middleeasternid, americanid;

using (var session = documentstore.opensession())
{
    var asian = new company { name = "company 1", region = "asia" };
    session.store(asian);
    var middleeastern = new company { name = "company 2", region = "middle-east" };
    session.store(middleeastern);
    var american = new company { name = "company 3", region = "america" };
    session.store(american);

    asianid = asian.id;
    americanid = american.id;
    middleeasternid = middleeastern.id;

    session.store(new invoice { companyid = american.id, amount = 3 });
    session.store(new invoice { companyid = asian.id, amount = 5 });
    session.store(new invoice { companyid = middleeastern.id, amount = 12 });
    session.savechanges();

}

using (var session = documentstore.opensession())
{
    session.query<company>()
        .where(x => x.region == "america")
        .tolist();

    session.load<company>(middleeasternid);

    session.query<invoice>()
        .where(x => x.companyid == asianid)
        .tolist();
}

what you see here is the code that saves both companies and invoices, and does this over multiple servers. let us see the log output for this code:

image

you can see a few interesting things here:

  • the first four requests are to manage the document ids (hilos). by default, we use the first server as the one that will store all the hilo information.
  • next (request #5) we are saving two documents, note that the shard id is now part of the document id.
  • request 6 and 7 here are actually queries, we returned 1 results for the first query, and none for the second.

let us look at another shard now:

image

this is much shorter, since we don’t have the hilo requests. the first request is there to store two documents, and then we see two queries, both of which return no results.

and the last shard:

image

here we again don’t see the hilo requests (since they are all on the first server). we do see putting of the two docs, and request #2 is a query that returns no results.

request #3 is interesting, because we did not see that anywhere else. since we did a load by id, and since by default we store the shard id in the document id, we were able to optimize this operation and go directly to the relevant shard, bypassing the need to query anything other server.

the last request is a query, for which we have a result.

so what did we have so far?

we were able to easily configure ravendb to use 3 ways sharding in a few lines of code. it automatically distributed writes and reads for us, and when it could, it optimized the data access so it would only access the relevant shards. writes are distributed on a round robin basis, so it is pretty fair. and reads are optimized on whatever we can figure out a minimal number of shards to query. for example, when we do a load by id, we can figure out what the shard id is, and query that server directly, rather than all of them.

pretty cool, if you ask me.

now, you might have noticed that i called this post blind sharding. the reason this is called this name is that this is pretty much the lowest rung in the sharding ladder. it is good, it split your data and it tries to optimize things, but it isn’t the best solution. i’ll discuss a better solution in my next post.

Database

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top 11 Git Commands That Every Developer Should Know
  • 10 Easy Steps To Start Using Git and GitHub
  • How To Best Use Java Records as DTOs in Spring Boot 3
  • Cloud Performance Engineering

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: