Benchmarking Storage Performance (Latency, Throughput) Using Python

Benchmarking AWS S3 with Python reveals latency and throughput differences across storage classes, helping you balance speed, cost, and reliability.

Arjun Mullick

Aug. 26, 25 · Tutorial

Likes (1)

Comment

Save

6.3K Views

Understanding the performance of your AWS S3 storage specifically, how quickly you can read and write data is essential for both cost optimization and application speed. By running Python scripts that measure latency and throughput, you can compare different S3 storage classes, identify hidden bottlenecks, and make data-driven decisions about where and how to store your data.

This article breaks down the fundamentals of S3 benchmarking, provides working Python examples, and shows how to interpret the results even if you’re not a cloud infrastructure expert.

Introduction

Not all cloud storage is created equal. AWS S3 offers several storage classes like Standard, Intelligent-Tiering, and Glacier that balance cost and performance differently. If your application needs to access data quickly, or if you’re storing large files, knowing how your storage performs can save you time and money. Benchmarking is the process of measuring how fast you can upload, download, and list files in S3. By doing this, you can choose the right storage class for your needs and spot performance issues before they impact your users. AWS S3 storage class overview.

Prerequisites

An AWS account with S3 access
Python 3.x installed
The boto3 library (pip install boto3)
AWS credentials configured (~/.aws/credentials)
Basic understanding of Python scripting

If you’re new to AWS or Python, here’s a getting started guide.

Why Benchmark Storage Performance?

In the era of elastic infrastructure, where storage is just an API call away, it's tempting to assume that all storage is created equal especially in cloud environments like AWS. But that assumption can be expensive, slow, or both. Benchmarking storage performance isn't just a good to have step for cloud architects or system engineers; it's an essential practice that underpins cost-efficiency, user experience, and operational resilience. In AWS S3 arguably the most widely used object storage service benchmarking offers the key to understanding real-world performance implications between storage classes like S3 Standard, S3 Intelligent-Tiering, S3 Glacier, and more.

Cloud storage is easy to provision but hard to predict. AWS provides documentation on expected performance, but real-world behavior can vary significantly depending on:

AWS region
Object size
Access pattern
Request concurrency
Network conditions

Key Metrics

We will further dive deeper into what it means to benchmark storage, why it matters, and how it can drastically influence the decisions you make around performance, cost, and architecture. Before we get into benchmarking strategies, it’s worth revisiting the foundational performance metrics in any storage system: latency and throughput.

Latency measures how quickly a system can begin a file operation be it upload, download, or deletion. For instance, the time from when you request an object to when the first byte arrives.
Throughput, on the other hand, tells you how much data you can push through the system per second usually measured in MB/s or GB/s.

Together, these two metrics form the lens through which all storage performance should be viewed. A storage class might have high throughput but poor latency, which could be ideal for bulk analytics workloads but disastrous for real-time web applications. Here’s the simple truth: Cloud providers give a rough estimate of storage performance, but real-world behavior can differ significantly based on region, request pattern, data size, concurrency, tooling, and network path. This is why benchmarking is not optional it’s strategic.

AWS S3 Storage Class Overview

AWS offers a buffet of S3 storage classes, each with its own performance profile:

Storage Class	Optimized For	Retrieval Latency	Cost
S3 Standard	Frequent access	Low (milliseconds)	$$$
S3 IA (Infrequent Access)	Lower access frequency	Low (milliseconds)	$$
S3 Intelligent-Tiering	Dynamic optimization	Low to moderate	$$
S3 Glacier	Archival storage	Minutes to hours	$
S3 Glacier Deep Archive	Long-term archival	Hours	$ (very low)

On paper, these storage classes look cleanly separated. But in practice? For example, S3 IA(Infrequent Access) might exhibit unpredictable latencies during retrieval bursts. Glacier’s retrieval speeds vary depending on whether you choose expedited or bulk retrieval. Benchmarking these classes using tools like boto3, s3-benchmark, or custom Python/Go scripts can reveal nuanced trade-offs.

A sample test setup might:

Upload 1,000 objects of varying sizes (10KB, 100KB, 10MB, 100MB).
Read those objects concurrently from different regions.
Record average and 95th percentile latency, and throughput over time.

This empirical evidence helps teams confidently decide if S3 IA is "good enough" for an analytics pipeline, or whether the retrieval delays from Glacier are tolerable in a compliance-driven backup strategy.

Detecting Slowdowns: Network, Region, or Storage Class?

Ever noticed an app’s file download suddenly feels sluggish, but CloudWatch metrics show no clear culprit? That’s where benchmarking shines as a diagnostic tool.

Performance degradation could stem from:

Cross-region traffic: A client in Oregon accessing data in eu-west-1 will naturally incur higher latency.
Throttling: AWS enforces request rate limits per prefix and per IP. If you’re exceeding those, retries and slowdowns become visible.
Storage class cold starts: Accessing infrequently accessed IA/Glacier objects might trigger a delay as the object is "rehydrated".

By benchmarking periodically or as part of CI/CD pipelines (especially before large-scale data ingestion or analytics jobs), you catch performance regressions early before your users do.

Optimizing Cost vs. Performance

There’s always a tension between performance and cost in cloud storage. Benchmarking helps find the right balance.

Case Study: A media processing pipeline originally using S3 Standard for storing transcode jobs could switch to S3 Intelligent-Tiering, saving 30% without significant latency increase but only after testing confirmed <100ms object retrieval times.
For Machine Learning: Data scientists training models on petabyte-scale datasets might assume S3 Standard is the default. But if they benchmark read performance with parallel reads and find no degradation in IA class, the team might save thousands in monthly storage costs.

Here’s a minimal Python snippet using boto3 to measure upload/download latency and throughput:

     Python
    
 

    import time
import boto3

s3 = boto3.client('s3')
bucket = 'your-test-bucket'
key = 'test-object'
data = b'x' * (10 * 1024 * 1024)  # 10MB

# Upload latency
start = time.time()
s3.put_object(Bucket=bucket, Key=key, Body=data)
upload_latency = time.time() - start

# Download latency
start = time.time()
s3.get_object(Bucket=bucket, Key=key)
download_latency = time.time() - start

print(f"Upload Latency: {upload_latency:.2f}s")
print(f"Download Latency: {download_latency:.2f}s")

   

Extended Benchmark Example

You can benchmark upload, download, and list operations to get a more complete picture.

     Python
    
 

    import boto3
import time

s3 = boto3.client('s3')
bucket = 'your-bucket-name'
filename = 'testfile.bin'
object_name = 'benchmark/testfile.bin'

# Create a 10MB test file
with open(filename, 'wb') as f:
    f.write(b'0' * 10 * 1024 * 1024)

# Upload benchmark
start = time.time()
s3.upload_file(filename, bucket, object_name)
upload_time = time.time() - start
print(f'Upload time: {upload_time:.2f} seconds')

# Download benchmark
start = time.time()
s3.download_file(bucket, object_name, 'downloaded_testfile.bin')
download_time = time.time() - start
print(f'Download time: {download_time:.2f} seconds')

# List operation latency
start = time.time()
s3.list_objects_v2(Bucket=bucket, Prefix='benchmark/')
list_latency = time.time() - start
print(f'List operation latency: {list_latency:.3f} seconds')
   

Advanced benchmarking can include:

Parallel uploads/downloads with asyncio or threading
Storage monitoring with AWS CloudWatch and X-Ray
Integrating with Grafana dashboards for visualization

For example, S3 Standard is fast but more expensive, while S3 Glacier is cheap but much slower for retrieval. AWS S3 storage class comparison.

Measuring Latency for Small Operations

For many applications, the time it takes to list files or check if a file exists (metadata operations) is just as important as upload/download speed. Here’s how to measure that:

    Python
   
 

   import time
start = time.time()
response = s3.list_objects_v2(Bucket=bucket, Prefix='benchmark/')
latency = time.time() - start
print(f'List operation latency: {latency:.3f} seconds')
  

Interpreting the Results

Shorter upload and download times indicate better throughput, while lower latency means your application will feel faster and more responsive. If you observe high latency or reduced throughput, consider switching to a different AWS region that is geographically closer to your users, using a faster storage class, compressing files before upload to reduce size, or uploading data in larger batches rather than many small individual files. These adjustments can significantly improve performance and efficiency.

Comparing Storage Classes

You can repeat your tests with objects stored in different classes (e.g., Standard, Standard-IA, Glacier) to see how performance changes. Remember, some classes like Glacier are designed for archival and can take minutes or hours to retrieve data.

Best Practices

To maximize the efficiency and cost-effectiveness of your S3 storage, it's important to follow a few key best practices. Compressing data before uploading can significantly reduce the amount of data transferred over the network, which not only speeds up uploads and downloads but also lowers your storage and transfer costs. When dealing with large volumes of small files, batching them into larger archives—such as ZIP or TAR files—can improve throughput by reducing the number of API calls and associated overhead. Choosing the right AWS region is also critical; placing your data closer to where it is accessed minimizes latency and improves overall responsiveness. Lastly, as your datasets grow or your application's access patterns evolve, it is essential to continuously monitor storage performance. Regular benchmarking and monitoring help you detect bottlenecks early and ensure that your storage strategy continues to meet both performance and budget requirements.

Compress data before uploading to reduce transfer time and storage costs.
Batch small files into larger archives to improve throughput and reduce API call costs.
Use the right region to minimize latency.
Monitor performance regularly as your data grows or your access patterns change.

Conclusion

Benchmarking your AWS S3 storage with Python scripts is a practical and effective way to uncover real-world performance characteristics that are often hidden beneath marketing claims or generic documentation. Whether you're working with high-throughput machine learning pipelines, media archives, or latency-sensitive applications, understanding the nuances of latency, throughput, and API behavior across storage classes helps you make informed, cost-efficient decisions. It's not just about speedit's about choosing the right trade-offs for your workload.

With a relatively simple set of Python tools, you can systematically measure how your storage performs across different regions, file sizes, and usage patterns. These insights let you confidently select the appropriate S3 class, avoid hidden performance bottlenecks, and ensure your infrastructure scales reliably as your business grows. Benchmarking should be an ongoing part of your DevOps or data engineering lifecycle—not a one-time activity. In a world where storage is elastic and usage is unpredictable, proactive performance testing is key to avoiding surprises and maintaining both operational excellence and budget control.

References:

AWS Python (language) Throughput (business) Performance

Published at DZone with permission of Arjun Mullick. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

Trending