This blog post discusses how you can protect your e-commerce database from a high traffic disaster.
Databases power today’s e-commerce. Whether it’s listing items on your site, contacting your distributor for inventory, tracking shipments, payments, or customer data, your database must be up, running, tuned and available for your business to be successful.
There is no time that this is more important than high-volume traffic days. There are specific events that occur throughout the year (such as Black Friday, Cyber Monday, or Singles Day) that you know are going to put extra strain on your database environment. But these are the specific times that your database can’t go down — these are the days that can make or break your year!
So what can you do to guarantee that your database environment is up to the challenge of handling high traffic events? Are there ways of preparing for this type of traffic?
Yes, there are! In this blog post, we’ll look at some of the factors that can help prepare your database environment to handle large amounts of traffic.
Synchronous vs. Asynchronous Applications
Before moving to strategies, we need to discuss the difference between synchronous and asynchronous applications.
In most web-based applications, user input starts a number of requests for resources. Once the server answers the requests, no communication stops until the next input. This type of communication between a client and server is called synchronous communication.
Restricted application updates limit synchronous communication. Even synchronous applications designed to automatically refresh application server information at regular intervals have consistent periods of delay between data refreshes. While, usually, such delays aren’t an issue, some applications (for example, stock-trading applications) rely on continuously updated information to provide their users optimum functionality and usability.
Web 2.0-based applications address this issue by using asynchronous communication. Asynchronous applications deliver continuously updated data to users. Asynchronous applications separate client requests from application updates, so multiple asynchronous communications between the client and server can occur simultaneously or in parallel.
The strategy you use to scale the two types of applications to meet growing user and traffic demands will differ.
Scaling a Synchronous/Latency-Sensitive Application
When it comes to synchronous applications, you really have only one option for scaling performance: sharding. With sharding, the tables are divided and distributed across multiple servers, which reduces the total number of rows in each table. This consequently reduces index size and generally improves search performance.
A shard can also be located on its own hardware, with different shards added to different machines. This database distribution over a large multiple of machines spreads the load out, also improving performance. Sharding allows you to scale read and write performance when latency is important.
Generally speaking, it is better to avoid synchronous applications when possible – they limit your scalability options.
Scaling an Asynchronous Application
When it comes to scaling asynchronous applications, we have many more options than with synchronous applications. You should try and use asynchronous applications whenever possible:
- Secondary/Slave hosts: Replication can be used to add more hardware for read traffic. Replication usually employs a master/slave relationship between a designated “original” server and copies of the server. The master logs and then distributes the updates to the slaves. This setup allows you to distribute the read load across more than one machine.
- Caching: Database caching (tables, data, and models — caching summaries of data) improves scalability by distributing the query workload from expensive (overhead-wise) backend processes to multiple cheaper ones. It allows more flexibility for data processing: for example premium user data can be cached, while regular user data isn’t.
Caching also improves data availability by providing applications that don’t depend on backend services continued service. It also allows for improved data access speeds by localizing the data and avoiding roundtrip queries. There are some specific caching strategies you can use:
- Pre-emptive caching: Ordinarily, an object gets cached the first time it is requested (or if cached data isn’t timely enough). Preemptive caching instead generates cached versions before an application requests them. Typically this is done by a cron process.
- Hit/miss caching: A cache hit occurs when an application or software requests data. First, the central processing unit (CPU) looks for the data in its closest memory location, which is usually the primary cache. If the requested data is found in the cache, it is considered a cache hit. Cache miss occurs within cache memory access modes and methods. For each new request, the processor searched the primary cache to find that data. If the data is not found, it is considered a cache miss. A cache hit serves data more quickly, as the data can be retrieved by reading the cache memory. The cache hit also can be in disk caches where the requested data is stored and accessed by the first query. A cache miss slows down the overall process because after a cache miss, the central processing unit (CPU) will look for a higher level cache, such as random access memory (RAM) for that data. Further, a new entry is created and copied into cache before it can be accessed by the processor.
- Client-side caching: Client-side caching allows server data to be copied and cached on the client computer. Client side caching reduces load times by several factors.
- Queuing updates: Queues are used to order queries (and other database functions) in a timely fashion. There are queues for asynchronously sending notifications like email and SMS in most websites. E-commerce sites have queues for storing, processing and dispatching orders. How your database handles queues can affect your performance:
- Batching: Batch processing can be used for efficient bulk database updates and automated transaction processing, as opposed to interactive online transaction processing (OLTP) applications.
- Fan-out updates: Fan-out duplicates data in the database. When data is duplicated, it eliminates slow joins and increases read performance.
Efficient Usage of Data at Scale
As you scale up in terms of database workload, you need to be able to avoid bad queries or patterns from your applications.
- Moving expensive queries out of the user request path. Even if your database server uses powerful hardware, its performance can be negatively affected by a handful of expensive queries. Even a single bad query can cause serious performance issues for your database. Make sure to use monitoring tools to track down the queries that are taking up the most resources.
- Using caching to offload database traffic. Cache data away from the database using something like memcached. This is usually done at the application layer and is highly effective.
- Counters and in-memory stores. Use memory counters to monitor performance hits: pages/sec, faults/sec, available bytes, total server, target server memory, etc. Percona’s new in-memory storage engine for MongoDB also can help.
- Connection Pooling. A connection pool made up of cached database connections, remembered so that the connections can be reused for future requests to the database. Connection pools can improve the performance of executing commands on a database.
Scaling Out (Horizontal) Tricks
Scaling horizontally means adding more nodes to a system, such as adding a new server to a database environment to a distributed software application. For example, scaling out from one Web server to three.
- Pre-sharding data for flexibility: Pre-sharding the database across the server instances allows you to have the entire environment resources available at the start of the event, rather than having to rebalance during peak event traffic.
- Using “kill switches” to control traffic: The idea of a kill switch is a single point where you can stop the flow of data to a particular node. Strategically set up kill switches allow you to stop a destructive workload that begins to impact the entire environment.
- Limiting graph structures: By limiting the size or complexity of graph structures in the database, you will simplify data lookups and data size.
Scaling With Hardware (Vertical Scaling)
Another option to handle the increased traffic load is adding more hardware to your environment: more servers, more CPUs, more memory, etc. This, of course, can be expensive. One option here is to pre-configure your testing environment to become part of the production environment if necessary. Another is to pre-configure more Database-as-a-Service (DaaS) instances for the event (if you are a using cloud-based services).
Whichever method, be sure you verify and test your extra servers and environment before your drop-dead date.
Testing Performance and Capacity
As always, in any situation where your environment is going to be stressed beyond usual limits, testing under real-world conditions is a key factor. This includes not only testing for raw traffic levels, but also the actual workloads that your database will experience, with the same volume and variety of requests.
Knowing Your Application and Questions to Ask at Development Time
Finally, it’s important that you understand what applications will be used and querying the database. This sort of common sense idea is often overlooked, especially when teams (such as the development team and the database/operations team) get siloed and don’t communicate.
Get to know who is developing the applications that are using the database, and how they are doing it. As an example, a while back I had the opportunity to speak with a team of developers, mostly to just understand what they were doing. In the process of whiteboarding the app with them, we discovered a simple query issue that – now that we were aware of it – took little effort to fix. These sorts of interactions, early in the process, can save a great deal of headache down the line.