DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Debugging Performance Regressions in High-Scale Java Web Services: A Systematic Approach
  • Streamlining Real-Time Ad Tech Systems: Techniques to Prevent Performance Bottlenecks
  • Charge Vertical Scaling With the Latest Java GCs
  • Every Cache Miss Is a Tiny Tax on Your Performance

Trending

  • Multi-Scale Feature Learning in CNN and U-Net Architectures
  • Building a Production-Ready AI Agent in 2026: Beyond the Hello World Demo
  • You Are Using Claude Wrong (And So Is Everyone You Know)
  • Introduction to Retrieval Augmented Generation (RAG)
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Benchmarking Instance Types for Amazon OpenSearch Workloads

Benchmarking Instance Types for Amazon OpenSearch Workloads

A detailed performance analysis between Amazon OpenSearch's specialized OM2 and general-purpose M7g instances to help you optimize performance and cost.

By 
Jatinder Singh user avatar
Jatinder Singh
·
Sep. 22, 25 · Analysis
Likes (0)
Comment
Save
Tweet
Share
2.5K Views

Join the DZone community and get the full member experience.

Join For Free

Choosing the optimal instance type for Amazon OpenSearch clusters is crucial for balancing performance and cost. With AWS offering both the OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances, organizations face an important decision.

While OM2 instances are tailored for OpenSearch with high memory-to-vCPU ratios, M7g instances bring the latest technology, promising enhanced overall performance. The best choice depends on your specific workload characteristics and requirements.

This article presents comprehensive benchmark comparisons between these instance types, providing DevOps teams and architects with actionable insights for making informed infrastructure decisions. We'll examine real-world performance metrics and cost implications to help you optimize your OpenSearch deployment.

Understanding Benchmark Testing in OpenSearch Optimization

Benchmark testing in OpenSearch is a systematic process of evaluating cluster performance under controlled conditions, measuring key metrics like query latency, throughput, and resource utilization. For distributed search engines like OpenSearch, benchmarking goes beyond simple performance testing — it's about understanding how your cluster behaves under specific workload patterns. It provides quantitative data for making informed decisions about infrastructure, configuration, and scaling strategies. 

By simulating real-world workloads and measuring system behavior under controlled conditions, teams can optimize their OpenSearch deployments effectively.

The four essential pillars of OpenSearch benchmark testing are as follows:

  1. Performance optimization: Focuses on measuring and improving query response times, throughput, and overall cluster efficiency. This helps teams validate configuration changes and understand the impact of different workload patterns.
  2. Capacity planning: Enables teams to make data-driven decisions about cluster sizing, shard allocation, and scaling strategies. It helps predict resource requirements for future growth and ensures reliable performance during peak loads.
  3. Cost management: Provides insights into resource utilization and helps optimize infrastructure spending. By understanding performance per dollar metrics, teams can make informed decisions about instance types and cluster configurations.
  4. Bottleneck identification: Helps pinpoint performance constraints across CPU, memory, network, and storage. Early identification of bottlenecks allows teams to address issues before they impact production workloads.

Understanding these pillars is crucial for conducting meaningful benchmark tests that drive improvements in your OpenSearch deployment.

Benchmark Setup and Methodology

OpenSearch Benchmark, a tool provided by the OpenSearch Project, comprehensively gathers performance metrics from OpenSearch clusters, including indexing throughput and search latency. Whether you’re tracking overall cluster performance, informing upgrade decisions, or assessing the impact of workflow changes, this utility proves invaluable.

We compare the performance of two clusters: one powered by OpenSearch-specialized OM2 instances and the newer general-purpose M7g instances. The dataset comprises HTTP server logs from the 1998 World Cup website and is commonly used for ingestion-heavy and search-intensive scenarios, making it ideal for comparing instance performance in such tasks. With the OpenSearch Benchmark tool, we conduct experiments to assess various performance metrics, such as indexing throughput, search latency, and overall cluster efficiency. Our aim is to determine the most suitable configuration for our specific workload requirements.

You can install OpenSearch Benchmark directly on a host running Linux or macOS, or you can run OpenSearch Benchmark in a Docker container on any compatible host. OpenSearch Benchmark includes a set of workloads that you can use to benchmark your cluster performance. Workloads contain descriptions of one or more benchmarking scenarios that use a specific document corpus to perform a benchmark against your cluster. The document corpus contains indexes, data files, and operations invoked when the workflow runs.

When assessing your cluster’s performance, it is recommended to use a workload similar to your cluster’s use cases, which can save you time and effort. Consider the following criteria to determine the best workload for benchmarking your cluster:

  • Use case: Selecting a workload that mirrors your cluster’s real-world use case is essential for accurate benchmarking. By simulating heavy search or indexing tasks typical for your cluster, you can pinpoint performance issues and optimize settings effectively. This approach makes sure benchmarking results closely match actual performance expectations, leading to more reliable optimization decisions tailored to your specific workload needs.
  • Data: Use a data structure similar to that of your production workloads. OpenSearch Benchmark provides examples of documents within each workload to understand the mapping and compare with your own data mapping and structure. Every benchmark workload is composed of the following directories and files for you to compare data types and index mappings.
  • Query types: Understanding your query pattern is crucial for detecting the most frequent search query types within your cluster. Employing a similar query pattern for your benchmarking experiments is essential.

The OpenSearch Benchmarking Process follows a systematic workflow consisting of the following five key steps: 

1. Environment Setup

Configure a testing environment that closely mirrors your production setup. Ensure hardware meets minimum requirements (e.g., CPU, RAM, SSD storage) and set up an OpenSearch cluster or domain for benchmarking. When you select an instance, you should also think about which workloads you want to run. As a general rule, make sure that the OpenSearch Benchmark host has enough free storage space to store the compressed data and the fully decompressed data corpus once OpenSearch Benchmark is installed.

  • Hardware requirements
    • CPU: 8+ cores recommended
    • RAM: 16GB minimum, 32GB+ recommended
    • Storage: SSD with at least 3x the size of your test dataset – 500GB
  •  Software requirements
    • Python 3.8 or later. python3 --version
    • Pip installed. pip --version
    • Git 1.9 or later. git --version
  • Installing on Linux
    • After the required software is installed, install the OpenSearch Benchmark using the following command: pip install opensearch-benchmark 
    • Verify the installation using the command below: opensearch-benchmark -h
    • Refer to the documentation for installing the OpenSearch Benchmark with Docker.

2. Select and Configure Workload

Choose a workload that matches your use case (e.g., http_logs, geonames). Workloads define datasets, queries, and operations to simulate real-world scenarios. Customize workload parameters if needed, such as target throughput or concurrency.

workload name document count compressed size uncompressed size
http_logs 247,249,096 1.2 GB 31.1 GB


To see a list of default benchmark workloads, visit the opensearch-benchmark-workloads repository on GitHub.

3. Data Ingestion

Load the workload dataset into the target OpenSearch cluster. This step prepares the index and ensures the data is ready for benchmarking operations.

4. Run Benchmark Tests

Execute benchmark tests using OpenSearch Benchmark. Tests simulate operations like indexing, querying, and aggregations while collecting metrics such as latency, throughput, and system resource usage.

This example runs a benchmark with http_logs workload and a disabled certificate verification:

Shell
 
opensearch-benchmark execute-test \
--target-hosts=https://opensearch-cluster-dns-name:9200 \
--pipeline=benchmark-only \
--workload=http_logs \
--client-options=basic_auth_user:*****,basic_auth_password:******,certs:false


5. Analyze Results

Review collected metrics to evaluate cluster performance. Use insights to identify bottlenecks, optimize configurations, or compare different setups for improvements. The OpenSearch Benchmark summary report provides metrics related to the performance of your cluster; how you compare and use those metrics depends on your use case. 

OpenSearch Benchmark results are stored in-memory or in external storage, and results can be found in the /.benchmark/benchmarks/test_executions/<test_execution_id> directory. Results are named in accordance with the test_execution_id of the most recent workload test.

Performance Benchmark Analysis: OM2 vs M7g for Amazon OpenSearch

In this article, we conducted a performance comparison between two different configurations of OpenSearch Service:

  • Configuration 1 – Cluster manager nodes and two data nodes of OpenSearch-specialized OM2 instances
  • Configuration 2 – Cluster manager nodes and two data nodes of the newer general-purpose M7g instances

In both configurations, we use the same number and type of cluster manager nodes: three c6g.xlarge. You can set up different configurations with the supported instance types in OpenSearch Service to run performance benchmarks.

The following table summarizes our OpenSearch Service configuration details.

COMPONENT OM2 CLUSTER M7G CLUSTER

CLUSTER MANAGER NODES

Instance Type

c6g.large

c6g.large

Count

3

3

DATA NODES

Instance Type

OM2.2xlarge

M7g.2xlarge

Count

2

2

vCPUs per node

8

8

Memory per node

32 GiB

32 GiB

Storage Configuration

Volume Type

gp3

gp3

Size

500 GB

500 GB

IOPS

3000

3000

OPENSEARCH CONFIGURATION

Version

2.19

2.19

Shards per index

5

5

Replicas

1

1

JVM Heap

8GB

8GB

MONITORING

CloudWatch Metrics

Enabled

Enabled

Metric Frequency

1 minute

1 minute


Now let’s examine the performance details between the two configurations.

Performance Benchmark Comparison

The http_logs dataset contains HTTP server logs from the 1998 World Cup website between April 30, 1998, and July 26, 1998. Each request consists of a timestamp field, client ID, object ID, size of the request, method, status, and more. The uncompressed size of the dataset is 31.1 GB with 247 million JSON documents. The amount of load sent to both domain configurations is identical. The following table displays the amount of time taken to run various aspects of an OpenSearch workload on our two configurations.

Here's the comprehensive comparison with use cases/scenarios:

METRIC TYPE METRIC DESCRIPTION USE CASE M7G OM2 %CHANGE WINNER

Indexing Performance








Indexing Time

Primary shards

Total time for document indexing across primary shards

Log ingestion, Document processing

87.03 min

65.68 min

-24.54%

OM2 ✅

Flush Time

Primary shards

Time to persist indexed data to disk

Large batch updates, Data migrations

8.57 min

5.06 min

-41.03%

OM2 ✅

GC Time

Young Gen

Garbage collection time for recent objects

Memory-intensive operations

16.50 sec

7.29 sec

-55.83%

OM2 ✅

Query Performance








Bulk Index

p99 latency

Time for 99% of bulk index operations

ETL processes, Data imports

300.02 ms

773.71 ms

+157.87%

M7g ✅

Query Throughput

Mean

Queries processed per second

High-traffic search applications

16.33 ops/s

0.025 ops/s

-99.85%

M7g ✅

Match All

p99 latency

Response time for full index scans

System health checks, Analytics

34.25 ms

31.87 ms

-6.95%

OM2 ✅

Term Query

p99 latency

Exact match query response time

Product catalog search, User lookups

35.14 ms

29.32 ms

-16.56%

OM2 ✅

Range Query

p99 latency

Range-based query response time

Time-series data, Price filters

50.66 ms

33.46 ms

-33.95%

OM2 ✅

Hourly Aggregation

p99 latency

Hourly data grouping response time

Metrics dashboards, Usage reports

72.77 ms

49.46 ms

-32.02%

OM2 ✅

Multi-term Aggregation

p99 latency

Complex aggregation response time

Business analytics, Complex reporting

2468.37 ms

2200.92 ms

-10.83%

OM2 ✅


The performance comparison between M7g and OM2 instances reveals distinct strengths for different use cases. OM2 excels in complex query operations with better latency for range queries, aggregations, and term searches, plus superior memory management. M7g, however, shows stronger performance in bulk operations and throughput-intensive tasks.

 This suggests using OM2 for production environments requiring consistent low-latency query performance, while M7g might be more suitable for development environments, batch processing, and cost-sensitive workloads where raw throughput is prioritized over query complexity.

Conclusion

In conclusion, our benchmarking analysis of OM2 and M7g instances in OpenSearch clusters reveals clear performance patterns to guide infrastructure decisions. OM2 instances demonstrate superior performance in complex query operations, memory management, and consistent low-latency responses, making them ideal for production environments with demanding search and analytics workloads. M7g instances excel in bulk operations and high-throughput scenarios, offering a cost-effective solution for development environments and batch processing tasks. 

The significant performance variations across metrics emphasize the importance of aligning instance selection with specific workload requirements. Organizations should carefully evaluate their use cases, considering factors like query complexity, throughput needs, and cost constraints, to choose the most suitable instance type or consider a hybrid approach for optimal performance.

Batch processing garbage collection Performance

Opinions expressed by DZone contributors are their own.

Related

  • Debugging Performance Regressions in High-Scale Java Web Services: A Systematic Approach
  • Streamlining Real-Time Ad Tech Systems: Techniques to Prevent Performance Bottlenecks
  • Charge Vertical Scaling With the Latest Java GCs
  • Every Cache Miss Is a Tiny Tax on Your Performance

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook