Selecting an AWS EC2 Instance for KeyDB
Selecting an AWS EC2 Instance for KeyDB
In this article, we discuss how to choose the right AWS EC2 instance to maximize your KeyDB performance.
Join the DZone community and get the full member experience.Join For Free
When searching for information on what instance type to use, the answer is typically, "it depends." This article boils down instance types with both general data on EC2 types as well as in-depth analysis of KeyDB performance and selection for different types.
The focus of this article will be using KeyDB as an in-memory database. There will be future articles on using KeyDB with FLASH storage, as well as with some of KeyDB-Pro's features, such as FLASH persistence and advanced querying, where many more cores can be taken advantage of under query intensive workloads (O(n) heavy operations, such as KEYS). When AWS Graviton2's are available, we will publish those results too.
Let's start taking a simple look at the breakdown of using EC2 instances to give you a general idea before digging heavy into an analysis.
As an in-memory database, the first question is what types will optimize cost, as memory (RAM) is often the main caveat. At an initial glance, you can see that x1 instances are priced the best along with r5 and t instances. However, going with the cheapest type may not be the best or the right choice for you. The other factors going into the decision are the storage capacity and performance you are looking for. Please note all costs recorded in this article are for on-demand instances.
Performance is usually a big factor to prevent latencies when heavily loaded. KeyDB is a fast, multithreaded database that can scale vertically before scaling horizontally. The chart below selects several instance types displaying the first (lowest cost) instance that is saturated at peak performance with the Memtier benchmark. It can be seen that C5, R5, and M5 instances are able to get the best performance out of the set. These instances have faster CPUs and are able to get 1 million ops/sec each.
As you can see, there are pros and cons of each instance type, depending on what you are looking for. To get a better understanding of each, we will take a look at some more data in order to make generalizations on each type described later in this report.
You may also like: KeyDB and the Tao of the Unikernel.
The charts below shows many of these instances. It compares the available memory (GB) of each instance, the benchmarked ops/sec, the price/GB ram, and the cost normalized for 1000ops/sec, as a cost per perf comparison. These charts can make it much easier to select and refine the best instance for your use case.
Please note that t2 and t3 instances are shaded differently because they are burstable instances, and the numbers represent peak load, not sustained load. If CPU credits are used up, the instances may be throttled down drastically. So, understanding your usage during selection is critical.
Because KeyDB is multithreaded, most of our users tend not to run more than one instance (except for high availability). This reduces the complexity of the setup for those not wanting to cluster.
The first trend you may notice is that performance becomes saturated above .4xlarge machines. So, if performance becomes a bottleneck for your application, you will shard at or above this machine size, depending on how much memory you require for your database. This ensures you scale the instance vertically first. Prior to clustering, you can also create a read-replica or active-replica instance to handle some traffic and provide high availability.
For KeyDB, there are different benefits and downfalls of each instance type depending on your setup/use case. We will outline benefits as they relate to KeyDB. For more on types you can read up here. Hopefully, this summary helps refine the decision.
M5 instances are great for most scenarios. They have a 3.1 GHz Intel Xeon® Platinum 8175 processor that enables KeyDB to get over 1 million ops/sec. Memory is available up to 384GB. If you are looking for performance and don't have a memory requirement as high as the r5 instances, this is a good route to go.
R5 instances offer the best combination of performance and memory available. Memory with r5 instances is one of the cheapest per GB, and it has the same processor as the m5 enabling over 1 million ops/sec. For large datasets requiring high performance, the r5 instances are one of the best options.
C5 compute optimized instances use a 3.5GHz Intel Xeon Scalable processor, which is very fast. However, the performance is similar to the m5/r5 processors getting just over 1 million ops/sec. For performance, the c5 instances work with reasonably cost-effective ops/sec for what you get. However, if you are limited by memory in any way, the c5 instances are the priciest when it comes to $/GB. Most use cases will not benefit with c5 instances over m5/r5.
T2 and T3 instances are great potential options for small fluctuating use cases. The t3 instances do get better performance than the t2 instances in our testing as the processors were slightly faster. The main thing to note about t2 and t3 instances is that they are burstable. This means they can perform great for short bursts but don't expect that performance continually. You have CPU credits that are used, and if you go over, your CPU is throttled down, which can create a negative high latency experience for your users. Some instances may be throttled down to as low as 5% of capacity, so it's important to understand your usage to see if it's the right fit. For burstable stats and descriptions check out this doc.
A1 arm instances are one of Amazon's more affordable options using the AWS Graviton Processors. These processors are better for smaller databases that perform under sustained loads, unlike the burstable t3/t2 instances. A1 instances provide the cheapest price when it comes to ops/sec (volume of operations processed per dollar spent); however, the cost per GB memory is relatively high, and the 2.3GHz processor does not get as high ops/sec as other instance types. For cases already using arm applications and workloads, KeyDB on Arm is a great fit and works well with it.
X1 instances have the lowest cost per GB memory, however, the 2.3GHz Intel Xeon E7 8880 v3 processor caps out at 620,000ops/sec, so it doesn't achieve quite as high of a benchmark as the m5/r5 instances. However, for cases requiring a lot of in-memory storage that are not close to the performance limits of the instance, the X1 family might be the right option for you.
Storage optimized instances, such as the i3 family are great especially for using features such as KeyDB on FLASH. We didn't cover this instance type in much detail as the FLASH argument vs instance type will be another article in itself. The i3 has a 2.3GHz Intel Xeon E5-2686 v4 processor and maxes out around 400,000 ops/sec. Costs are not comparable to other instances unless you are taking advantage of the SSD storage. The i3en instances use Intel® Xeon® Scalable (Skylake) or Xeon(R) Platinum 8175M processor and can get as much as 845,000ops/sec. Hence, paying attention to the instance specifiers is important when selecting them.
When selecting EC2 instances, there are a few other factors to consider. I do not intend on going into detail on these. However, it helps to be aware of your options:
- You may notice suffixes on the instances, such as d, n, a, or e. (i.e., m5d.4xlarge).
- 'n' means network optimized (higher bandwidth). The processor may be different, so take a look.
- 'd' means it has additional NVMe-based SSD storage. This is physically connected to the host server and provides block-level storage that is coupled to the lifetime of the instance.
- 'a' means it has an AMD processor.
- You can compare processor types and stats here.
- Keep the processor used in mind, as it can result in different performance. It is always recommended to run some sort of benchmarking when selecting instances.
- EBS optimized instances can ensure dedicated throughput between ec2 and EBS volumes, which can have benefits in some cases for backing up and saving. S3 options are also a cheap option available built into KeyDB.
There is a lot to chew on when trying to refine your selection. Finding the right compromise between cost, memory, performance, storage largely depends on the loading/traffic on your database. However, once you have an idea, it becomes a bit easier to choose. Hopefully, this article provides some insight on making your selections without being an expert on EC2 instances. The AWS documentation is really good for improving your understanding.
As evident in this article, there are equally as many questions when setting up a low cost, high storage instance using FLASH options heavily. KeyDB is releasing a new version of FLASH shortly and will be doing an evaluation on AWS instances.
The Memtier benchmarking instance was a m5.8xlarge instance. An instance of this size is required to max out traffic to KeyDB. The following command was used for testing in most cases.
KeyDB test instance typically run with the following command. Note that the number of threads will be truncated based on the available resources.
If you need authentication pass the argument
--authenticate=<yourpassword> to memtier and
--requirepass <yourpassword> to KeyDB. If you're on a secure network, you can use
--protected-mode no for KeyDB.
If your memtier instance is not large enough, it will be the bottleneck and will skew results. If you are going through a VPC, VPN, or load balancer, it's possible that you may have a bottleneck there. Otherwise, the numbers should be very similar to those posted.
Published at DZone with permission of Ben Schermel . See the original article here.
Opinions expressed by DZone contributors are their own.