DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Distributed SQL: An Alternative to Database Sharding
  • What Is Sharding?
  • Inside the Elastic Shard
  • How to Optimize Elasticsearch for Better Search Performance

Trending

  • A Continuous Testing Approach to Performance
  • SAP Business One vs. NetSuite: Comparison and Contrast of ERP Platforms
  • Understanding Europe's Cyber Resilience Act and What It Means for You
  • Edge Data Platforms, Real-Time Services, and Modern Data Trends
  1. DZone
  2. Data Engineering
  3. Databases
  4. Enabling Shards for Existing Database

Enabling Shards for Existing Database

A question came up in the mailing list, how do we enable sharding for an existing database?I’ll deal with data migration in this scenario at a later post.

Oren Eini user avatar by
Oren Eini
·
May. 22, 15 · Interview
Like (1)
Save
Tweet
Share
3.55K Views

Join the DZone community and get the full member experience.

Join For Free

a question came up in the mailing list, how do we enable sharding for an existing database. i’ll deal with data migration in this scenario at a later post.

the scenario is that we have a very successful application, and we start to feel the need to move the data to multiple shards. currently all the data is sitting in the rvn1 server. we want to add rvn2 and rvn3 to the mix. for this post, we’ll assume that we have the notion of customers and invoices.

previously, we access the database using a simple document store:

var documentstore = new documentstore
{
url = "http://rvn1:8080",
defaultdatabase = "shop"
};

now, we want to move to a sharded environment, so we want to write it like this. existing data is going to stay where it is at, and new data will be sharded according to geographical location.

var shards = new dictionary<string, idocumentstore>
{
{"origin", new documentstore {url = "http://rvn1:8080", defaultdatabase = "shop"}},//existing data
{"me", new documentstore {url = "http://rvn2:8080", defaultdatabase = "shop_me"}},
{"us", new documentstore {url = "http://rvn3:8080", defaultdatabase = "shop_us"}},
};

var shardstrategy = new shardstrategy(shards)
.shardingon<customer>(c => c.region)
.shardingon<invoice> (i => i.customer);

var documentstore = new shardeddocumentstore(shardstrategy).initialize();

this wouldn’t actually work. we are going to have to do a bit more. to start with, what happens when we don’t have a 1:1 match between region and shard? that is when the translator become relevant:

.shardingon<customer>(c => c.region, region =>
{
    switch (region)
    {
        case "middle east":
            return "me";
        case "usa":
        case "united states":
        case "us":
            return "us";
        default:
            return "origin";
    }
})


we basically say that we map several values into a single region. but that isn’t enough. newly saved documents are going to have the shard prefix, so saving a new customer and invoice in the us shard will show up as:

image

but existing data doesn’t have this (created without sharding).

image

so we need to take some extra effort to let ravendb know about them. we do this using the following two functions:

 func<string, string> potentialshardtoshardid = val =>
 {
     var start = val.indexof('/');
     if (start == -1)
         return val;
     var potentialshardid = val.substring(0, start);
     if (shards.containskey(potentialshardid))
         return potentialshardid;
     // this is probably an old id, let us use it.
     return "origin";

 };
 func<string, string> regiontoshardid = region =>
 {
     switch (region)
     {
         case "middle east":
             return "me";
         case "usa":
         case "united states":
         case "us":
             return "us";
         default:
             return "origin";
     }
 };


we can then register our sharding configuration so:

  var shardstrategy = new shardstrategy(shards)
      .shardingon<customer, string>(c => c.region, potentialshardtoshardid, regiontoshardid)
      .shardingon<invoice, string>(x => x.customer, potentialshardtoshardid, regiontoshardid); 


that takes care of handling both new and old ids, and let ravendb understand how to query things in an optimal fashion. for example, a query on all invoices for ‘customers/1’ will only hit the rvn1 server.

however, we aren’t done yet. new customers that don’t belong to the middle east or usa will still go to the old server, and we don’t want any modification to the id there. we can tell ravendb how to handle it like so:

var defaultmodifydocumentid = shardstrategy.modifydocumentid;
shardstrategy.modifydocumentid = (convention, shardid, documentid) =>
{
    if(shardid == "origin")
        return documentid;

    return defaultmodifydocumentid(convention, shardid, documentid);
};


that is almost the end. there is one final issue that we need to deal with, and that is the old documents, before we used sharding, don’t have the required sharding metadata. we can fix that using a store listener. so we have:

 var documentstore = new shardeddocumentstore(shardstrategy);
 documentstore.registerlistener(new addshardidtometadatastorelistener());
 documentstore.initialize();


where the listener looks like this:

 public class addshardidtometadatastorelistener : idocumentstorelistener
 {
     public bool beforestore(string key, object entityinstance, ravenjobject metadata, ravenjobject original)
     {
         if (metadata.containskey(constants.ravenshardid) == false)
         {
             metadata[constants.ravenshardid] = "origin";// the default shard id
         }
         return false;
     }

     public void afterstore(string key, object entityinstance, ravenjobject metadata)
     {
     }
 }


and that is it. i know that there seems to be quite a lot going on in here, but it basically can be broken down to three actions that we take:

  • modify the existing metadata to add the sharding server id via the listener.
  • modify the document id convention so documents on the old server won’t have a designation (optional).
  • modify the sharding configuration so we’ll understand that documents without a shard prefix actually belong to the origin shard.

and that is pretty much it.

Database Shard (database architecture)

Published at DZone with permission of Oren Eini, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Distributed SQL: An Alternative to Database Sharding
  • What Is Sharding?
  • Inside the Elastic Shard
  • How to Optimize Elasticsearch for Better Search Performance

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: