How I Handled the Scalability of the SQL Database
Learn more about the most common strategies to improve the ability of the database to process an increasing number of queries per second.
Join the DZone community and get the full member experience.Join For Free
In this article, I’ll talk about what I learned trying to increase the operational limits of the Inspector SQL database.
Before talking about read-replicas or sharded-data, it might be helpful to introduce the problem to solve and the most common strategies to improve the ability of the database to process an increasing number of queries per second, opnenig up new growth margins for the application.
Why So Many Performance Issues Are Caused by the Database?
We often forget that each request or process performed by the application is not “atomic”. When we write a piece of code we should keep in mind that that piece of code will be executed by 10 other requests at the same time. If it’s slow, it’s likely to affect other requests running in parallel, and at the end the whole system.
Database is a shared resource used by all processes and all servers behind your application. So, be careful with thinking “it’s okay if this piece of code isn’t optimized“. Even just one poorly designed access against the database can hurt the performance of the whole system.
What Problem Does Scaling the Database Solve
Before proceeding further, we should clarify a difference between vertical vs horizontal scaling because both are useful but suited to solve different problems.
If you deal with large datasets, import/export processes, data aggregation algorithms, and similar problems you probably can get more benefits scaling your database vertically.
Resource-intensive tasks usually don’t have great benefits from multiple instances. When a long running query is executed against an instance, that instance must have enough resources to execute the query without hurt the performance of other tasks.
For this type of database operations you should correctly size the RAM and CPU of the machines and take care of the performance of the SQL tasks you write.
You can learn more on how to tune your SQL queries to be able to run very heavy queries with better performance on this article: How to accelerate application performance with smart SQL queries.
Horizontal scaling instead aims to solves the opposite problem: allow the database to performs a big big number of small queries per second.
It is simply a matter of finding a way to run the database on multiple instances. Just like a load balancer that distribute the traffic between application servers. But, when you take a closer look to new database architectures, and see what’s actually working and what’s not, the fundamental problems with relational databases start to become more clear.
Achieving Scalability Is a Huge Challenge for Relational Databases
Relational databases are originally designed to run on a single server in order to maintain the integrity of the table mappings and avoid the typical problems of distributed computing.
To handle these concerns, relational database vendors have come out with a whole assortment of improvements, using more complex architectures like “master-slave” or “sharded-data”.
Below we are going to discuss “read-replicas” that is the Inspector implementation and it is the most used strategy to horizontally scale a relational database.
What Does Read-Replicas Mean?
A read replica is a copy of the primary instance that reflects changes to the primary in almost real time. You can basically use a read replica to do the following:
- Offload read requests from the primary instance;
- Perform a regional migration or failover to another instance for disaster recovery purposes.
We use the read replica to distribute the load against two instances instead of one. While it looks cool, this setup still has a weakness. The second instance is readonly.
If we can write on the second instance, she too would have to synchronize the data back to the first one (the master), but two-way synchronization is not allowed in relational datanases to avoid data corruption.
We can offload only the “read” queries from the primary instance, so it is the right solution if your application is “read” intensive. If your application is write intensive this architecture couldn’t be enough.
- It is very fast as doesn’t impose any restrictions on read performance.
- You can split read and write requests to different servers. For example, all analytics queries can be made on Slave nodes.
- It is not very reliable because of asynchronous replication. It means that some committed on master transactions maybe not available on slave if the master fails.
- Write requests can hardly be scaled. The only option to scale write requests is to increase compute capacity (RAM and CPU) of the Master node.
Master-Master replication can enable two way synchronization allowing us to write on both instance, but consider that even the biggest cloud computing providers doesn’t provide this option in their managed SQL database offer. It is probably a tunnel in which it is better not to enter.
Rely on Your Cloud Provider
In order to keep the infrastructure management flexible and cost friendly, we migrated our MySQL instance to Google Cloud SQL that offer the possibility to create/delete read replicas with a few clicks. This helps us focus our effort on the application development instead of SQL server issues.
All the most known cloud providers offer their managed SQL database with the ability to easily add read-replicas to your master instance like Amazon RDS or DigitalOcean managed databases.
It’s all about being able to better understand the architecture and how your application can get Aas many advantages as possible of it.
One of the code design constraints should be to avoid database write operations as much as possible to stay away from the most important limit to scalablity of a SQL database.
Hope this article was helpful to bring new ideas to imporve your application.
If you want learn more about how to scale up your application, join in Scalable Applications the international community of professional developers to share strategies and experiences building scalable systems.
Published at DZone with permission of Valerio Barbera. See the original article here.
Opinions expressed by DZone contributors are their own.