Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Split, Master and Global Tables

DZone's Guide to

Split, Master and Global Tables

· Cloud Zone
Free Resource

See how the beta release of Kubernetes on DC/OS 1.10 delivers the most robust platform for building & operating data-intensive, containerized apps. Register now for tech preview.

One of the most important ScaleBase concepts is its data-splitting techniques. They are transparent to the application, but can be configured. Because they have a major effect on application performance, I feel it is important to explain the differences between these table types.

Our example data will be an employee table, with the following values (those of you with an Oracle Database experience will recognize these values from the test HR schema that comes with the Oracle database).

ID First name Last name
100 Steven King
101 Neena Kochhar
102 Lex De Haan
103 Alexander Hunold
104 Bruce Ernst
105 David Austin
106 Valli Pataballa


Also, for our example, we will consider the following ScaleBase installation (without any high availability options), which contains four databases – A,B,C and D.

 

Split Tables

A split table is a table whose data is distributed across several database servers. The table definition itself resides on all the database servers, but the data is divided between them, based on a split algorithm that can be configured (hash, list or range are currently supported – we will explain these options in a future blog). This option should be used for large tables or tables that have many write operations on them. Since each database contains a portion of the data, write operations (including index updating) are much faster.

So, in our example, if the employees table was configured to be a split table, we would have receive the following data distribution:

Database A Database B Database C Database D
ID First name
100 Steven
101 Neena

 

ID First name
102 Lex
103 Alexander

 

ID First name
104 Bruce
105 David

 

ID First name
106 Valli

 


Global Tables

A global table means that the table configuration exists on all database servers, and each database server contains the exact data as all the other database servers. This option should be used for smaller tables, and is typically usually used in join operations with the split tables.

In our example, had we configured the employees table to be a global table, we would have received the following data distribution:

Database A Database B Database C Database D
ID First name
100 Steven
101 Neena
102 Lex
103 Alexander
104 Bruce
105 David
106 Valli

 

ID First name
100 Steven
101 Neena
102 Lex
103 Alexander
104 Bruce
105 David
106 Valli

 

ID First name
100 Steven
101 Neena
102 Lex
103 Alexander
104 Bruce
105 David
106 Valli

 

ID First name
100 Steven
101 Neena
102 Lex
103 Alexander
104 Bruce
105 David
106 Valli

 


Master Tables

Master tables are tables that only exist on one database (for the sake of this discussion, we’ll assume this is database A). This option is useful for data that is not used in joins. In this way, updating and selecting it are much faster.

Database A Database B Database C Database D
ID First name Last name
100 Steven King
101 Neena Kochhar
102 Lex De Haan
103 Alexander Hunold
104 Bruce Ernst
105 David Austin
106 Valli Pataballa

 

     


Summary

There are, of course, implications for each configuration option. Master tables cannot take part in joins with other tables, for example, without causing a performance impact. Global tables require XA transactions in order to ensure consistency between all nodes, and massive writes to them may cause lock situations that can hurt performance. Split tables are great, but you cannot split all tables, since complex queries are slow to execute.

We will be offering an offline analysis tool shortly that will assist in determining how to configure your tables. So, stay tuned to our site and Facebook page.

I hope this short post helped to explain the different options of our database configuration. Post back if you are interested in more information.

New Mesosphere DC/OS 1.10: Production-proven reliability, security & scalability for fast-data, modern apps. Register now for a live demo.

Topics:

Published at DZone with permission of Liran Zelkha. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}