Benchmarking Instance Types for Amazon OpenSearch Workloads

A detailed performance analysis between Amazon OpenSearch's specialized OM2 and general-purpose M7g instances to help you optimize performance and cost.

Jatinder Singh

Sep. 22, 25 · Analysis

Likes (0)

Comment

Save

2.6K Views

Choosing the optimal instance type for Amazon OpenSearch clusters is crucial for balancing performance and cost. With AWS offering both the OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances, organizations face an important decision.

While OM2 instances are tailored for OpenSearch with high memory-to-vCPU ratios, M7g instances bring the latest technology, promising enhanced overall performance. The best choice depends on your specific workload characteristics and requirements.

This article presents comprehensive benchmark comparisons between these instance types, providing DevOps teams and architects with actionable insights for making informed infrastructure decisions. We'll examine real-world performance metrics and cost implications to help you optimize your OpenSearch deployment.

Understanding Benchmark Testing in OpenSearch Optimization

Benchmark testing in OpenSearch is a systematic process of evaluating cluster performance under controlled conditions, measuring key metrics like query latency, throughput, and resource utilization. For distributed search engines like OpenSearch, benchmarking goes beyond simple performance testing — it's about understanding how your cluster behaves under specific workload patterns. It provides quantitative data for making informed decisions about infrastructure, configuration, and scaling strategies.

By simulating real-world workloads and measuring system behavior under controlled conditions, teams can optimize their OpenSearch deployments effectively.

The four essential pillars of OpenSearch benchmark testing are as follows:

Performance optimization: Focuses on measuring and improving query response times, throughput, and overall cluster efficiency. This helps teams validate configuration changes and understand the impact of different workload patterns.
Capacity planning: Enables teams to make data-driven decisions about cluster sizing, shard allocation, and scaling strategies. It helps predict resource requirements for future growth and ensures reliable performance during peak loads.
Cost management: Provides insights into resource utilization and helps optimize infrastructure spending. By understanding performance per dollar metrics, teams can make informed decisions about instance types and cluster configurations.
Bottleneck identification: Helps pinpoint performance constraints across CPU, memory, network, and storage. Early identification of bottlenecks allows teams to address issues before they impact production workloads.

Understanding these pillars is crucial for conducting meaningful benchmark tests that drive improvements in your OpenSearch deployment.

Benchmark Setup and Methodology

OpenSearch Benchmark, a tool provided by the OpenSearch Project, comprehensively gathers performance metrics from OpenSearch clusters, including indexing throughput and search latency. Whether you’re tracking overall cluster performance, informing upgrade decisions, or assessing the impact of workflow changes, this utility proves invaluable.

We compare the performance of two clusters: one powered by OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances. The dataset comprises HTTP server logs from the 1998 World Cup website and is commonly used for ingestion-heavy and search-intensive scenarios, making it ideal for comparing instance performance in such tasks. With the OpenSearch Benchmark tool, we conduct experiments to assess various performance metrics, such as indexing throughput, search latency, and overall cluster efficiency. Our aim is to determine the most suitable configuration for our specific workload requirements.

You can install OpenSearch Benchmark directly on a host running Linux or macOS, or you can run OpenSearch Benchmark in a Docker container on any compatible host. OpenSearch Benchmark includes a set of workloads that you can use to benchmark your cluster performance. Workloads contain descriptions of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains indexes, data files, and operations invoked when the workflow runs.

When assessing your cluster’s performance, it is recommended to use a workload similar to your cluster’s use cases, which can save you time and effort. Consider the following criteria to determine the best workload for benchmarking your cluster:

Use case: Selecting a workload that mirrors your cluster’s real-world use case is essential for accurate benchmarking. By simulating heavy search or indexing tasks typical for your cluster, you can pinpoint performance issues and optimize settings effectively. This approach makes sure benchmarking results closely match actual performance expectations, leading to more reliable optimization decisions tailored to your specific workload needs.
Data: Use a data structure similar to that of your production workloads. OpenSearch Benchmark provides examples of documents within each workload to understand the mapping and compare with your own data mapping and structure. Every benchmark workload is composed of the following directories and files for you to compare data types and index mappings.
Query types: Understanding your query pattern is crucial for detecting the most frequent search query types within your cluster. Employing a similar query pattern for your benchmarking experiments is essential.

The OpenSearch Benchmarking Process follows a systematic workflow consisting of the following five key steps:

1. Environment Setup

Configure a testing environment that closely mirrors your production setup. Ensure hardware meets minimum requirements (e.g., CPU, RAM, SSD storage) and set up an OpenSearch cluster or domain for benchmarking. When you select an instance, you should also think about which workloads you want to run. As a general rule, make sure that the OpenSearch Benchmark host has enough free storage space to store the compressed data and the fully decompressed data corpus once OpenSearch Benchmark is installed.

Hardware requirements
- CPU: 8+ cores recommended
- RAM: 16GB minimum, 32GB+ recommended
- Storage: SSD with at least 3x the size of your test dataset – 500GB
Software requirements
- Python 3.8 or later. python3 --version
- Pip installed. pip --version
- Git 1.9 or later. git --version
Installing on Linux
- After the required software is installed, install the OpenSearch Benchmark using the following command: pip install opensearch-benchmark
- Verify the installation using the command below: opensearch-benchmark -h
- Refer to the documentation for installing the OpenSearch Benchmark with Docker.

2. Select and Configure Workload

Choose a workload that matches your use case (e.g., http_logs, geonames). Workloads define datasets, queries, and operations to simulate real-world scenarios. Customize workload parameters if needed, such as target throughput or concurrency.

workload name	document count	compressed size	uncompressed size
http_logs	247,249,096	1.2 GB	31.1 GB

To see a list of default benchmark workloads, visit the opensearch-benchmark-workloads repository on GitHub.

3. Data Ingestion

Load the workload dataset into the target OpenSearch cluster. This step prepares the index and ensures the data is ready for benchmarking operations.

4. Run Benchmark Tests

Execute benchmark tests using OpenSearch Benchmark. Tests simulate operations like indexing, querying, and aggregations while collecting metrics such as latency, throughput, and system resource usage.

This example runs a benchmark with http_logs workload and a disabled certificate verification:

    Shell
   
 

   opensearch-benchmark execute-test \
--target-hosts=https://opensearch-cluster-dns-name:9200 \
--pipeline=benchmark-only \
--workload=http_logs \
--client-options=basic_auth_user:*****,basic_auth_password:******,certs:false
  

5. Analyze Results

Review collected metrics to evaluate cluster performance. Use insights to identify bottlenecks, optimize configurations, or compare different setups for improvements. The OpenSearch Benchmark summary report provides metrics related to the performance of your cluster; how you compare and use those metrics depends on your use case.

OpenSearch Benchmark results are stored in-memory or in external storage, and results can be found in the /.benchmark/benchmarks/test_executions/<test_execution_id> directory. Results are named in accordance with the test_execution_id of the most recent workload test.

Performance Benchmark Analysis: OM2 vs M7g for Amazon OpenSearch

In this article, we conducted a performance comparison between two different configurations of OpenSearch Service:

Configuration 1 – Cluster manager nodes and two data nodes of OpenSearch-specialized OM2 instances
Configuration 2 – Cluster manager nodes and two data nodes of the newer general-purpose M7g instances

In both configurations, we use the same number and type of cluster manager nodes: three c6g.xlarge. You can set up different configurations with the supported instance types in OpenSearch Service to run performance benchmarks.

The following table summarizes our OpenSearch Service configuration details.

COMPONENT	OM2 CLUSTER	M7G CLUSTER
CLUSTER MANAGER NODES
Instance Type	c6g.large	c6g.large
Count	3	3
DATA NODES
Instance Type	OM2.2xlarge	M7g.2xlarge
Count	2	2
vCPUs per node	8	8
Memory per node	32 GiB	32 GiB
Storage Configuration
Volume Type	gp3	gp3
Size	500 GB	500 GB
IOPS	3000	3000
OPENSEARCH CONFIGURATION
Version	2.19	2.19
Shards per index	5	5
Replicas	1	1
JVM Heap	8GB	8GB
MONITORING
CloudWatch Metrics	Enabled	Enabled
Metric Frequency	1 minute	1 minute

Now let’s examine the performance details between the two configurations.

Performance Benchmark Comparison

The http_logs dataset contains HTTP server logs from the 1998 World Cup website between April 30, 1998, and July 26, 1998. Each request consists of a timestamp field, client ID, object ID, size of the request, method, status, and more. The uncompressed size of the dataset is 31.1 GB with 247 million JSON documents. The amount of load sent to both domain configurations is identical. The following table displays the amount of time taken to run various aspects of an OpenSearch workload on our two configurations.

Here's the comprehensive comparison with use cases/scenarios:

METRIC TYPE	METRIC	DESCRIPTION	USE CASE	M7G	OM2	%CHANGE	WINNER
Indexing Performance
Indexing Time	Primary shards	Total time for document indexing across primary shards	Log ingestion, Document processing	87.03 min	65.68 min	-24.54%	OM2 ✅
Flush Time	Primary shards	Time to persist indexed data to disk	Large batch updates, Data migrations	8.57 min	5.06 min	-41.03%	OM2 ✅
GC Time	Young Gen	Garbage collection time for recent objects	Memory-intensive operations	16.50 sec	7.29 sec	-55.83%	OM2 ✅
Query Performance
Bulk Index	p99 latency	Time for 99% of bulk index operations	ETL processes, Data imports	300.02 ms	773.71 ms	+157.87%	M7g ✅
Query Throughput	Mean	Queries processed per second	High-traffic search applications	16.33 ops/s	0.025 ops/s	-99.85%	M7g ✅
Match All	p99 latency	Response time for full index scans	System health checks, Analytics	34.25 ms	31.87 ms	-6.95%	OM2 ✅
Term Query	p99 latency	Exact match query response time	Product catalog search, User lookups	35.14 ms	29.32 ms	-16.56%	OM2 ✅
Range Query	p99 latency	Range-based query response time	Time-series data, Price filters	50.66 ms	33.46 ms	-33.95%	OM2 ✅
Hourly Aggregation	p99 latency	Hourly data grouping response time	Metrics dashboards, Usage reports	72.77 ms	49.46 ms	-32.02%	OM2 ✅
Multi-term Aggregation	p99 latency	Complex aggregation response time	Business analytics, Complex reporting	2468.37 ms	2200.92 ms	-10.83%	OM2 ✅

The performance comparison between M7g and OM2 instances reveals distinct strengths for different use cases. OM2 excels in complex query operations with better latency for range queries, aggregations, and term searches, plus superior memory management. M7g, however, shows stronger performance in bulk operations and throughput-intensive tasks.

This suggests using OM2 for production environments requiring consistent low-latency query performance, while M7g might be more suitable for development environments, batch processing, and cost-sensitive workloads where raw throughput is prioritized over query complexity.

Conclusion

In conclusion, our benchmarking analysis of OM2 and M7g instances in OpenSearch clusters reveals clear performance patterns to guide infrastructure decisions. OM2 instances demonstrate superior performance in complex query operations, memory management, and consistent low-latency responses, making them ideal for production environments with demanding search and analytics workloads. M7g instances excel in bulk operations and high-throughput scenarios, offering a cost-effective solution for development environments and batch processing tasks.

The significant performance variations across metrics emphasize the importance of aligning instance selection with specific workload requirements. Organizations should carefully evaluate their use cases, considering factors like query complexity, throughput needs, and cost constraints, to choose the most suitable instance type or consider a hybrid approach for optimal performance.

Batch processing garbage collection Performance

Opinions expressed by DZone contributors are their own.

Related

Trending