How I Made AWS CLI 300% Faster!

DZone 's Guide to

How I Made AWS CLI 300% Faster!

This developer explains why he needed to take AWS's CLI up a notch and the experimental way he did it.

· Cloud Zone ·
Free Resource

Yeah yeah, it's "highly experimental" and all, but still, it's three times faster than simply running aws bla bla bla, the "plain" way.

And yes, it won't always be that fast, especially if you only run AWS CLI about once a fortnight. But it will certainly have a clear impact once you start batching up your AWS CLI calls; maybe routine account checks/cleanups, maybe extracting tons of CloudWatch Metrics records, or maybe a totally different, unheard-of use case.

Whatever it is, I guess it would be useful for the masses some day.

Plus, as one of the authors and maintainers of the world's first serverless IDE, I have certainly had several chances to put it to good use!

The Problem: Why AWS CLI Is "Too Slow" for Me

(Let's just call it "CLI", shall we?)

It's actually nothing to do with the CLI itself; rather, it's the fact that each CLI invocation is a completely new program execution cycle.

This means:

  • python (and ultimately the OS) has to load the binaries, configs, boto API definitions and so forth;
  • the CLI has to initialize itself: load all supported commands definitions, prepare parsers, generate API client classes and so forth.

But, as usual, the highest impact comes via the network I/O:

  • The CLI has to create an API client from scratch (the previous one was lost when the command execution completed).
  • Since the network connection to AWS is managed by the client, this means that each command creates (and then destroys) a fresh TCP connection to the AWS endpoint, which involves a DNS lookup as well (although later lookups may be served from the system cache).
  • Since AWS APIs almost always use SSL, every new connection results in a full SSL handshake (client hello, server hello, server cert, yadda, yadda, yadda)

Now, assume you have 20 CloudWatch Log Groups to be deleted. Since the Logs API does not offer a bulk deletion option, the cheapest way to do this would be to run a simple shell script — looping aws logs delete-log-group over all groups:

for i in $(aws logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
    aws logs delete-log-group --log-group-name $i

This would run the CLI 20 times (21 to be precise, if you count the initial list API call); meaning that all of the above will run 20 times. Clearly a waste of time and resources, since we were quite clear that the same endpoint was going to be invoked in all those runs.

Try scaling this up to hundreds or thousands of batched operations and see where it takes you!

And No, aws-shell Does Not Cut It.

Not yet, at least.

Leaving aside the nice and cozy REPL interface (interactive user prompt), handy autocompletion, syntax coloring and inline docs, does not give you any performance advantage over aws-cli. Every command in the shell is executed in a new AWS CLI instance — with parsers, command hierarchies, API specs and, more importantly, API clients — getting recreated for every command.

Skeptical? Peek at the aws-shell sources; or better still, fire up Wireshark (or tcpdump if you dare), run a few commands in the shell REPL, and see how each command initializes a fresh SSL channel from scratch.

The Proposal: What Can We Do?

Obviously, the CLI cannot do much about it. It's a simple program, and whatever improvements we do, won't last until the next invocation. The OS would rudely wipe them and start the next CLI with a clean slate; unless we use some spooky (and rather discouraged) memory persistence magic to serialize and reload the CLI's state. Even then, the other OS-level stuff (network sockets etc.) will be gone, and our effort would be pretty much fruitless.

If we are going to make any impactful changes, we need to make the CLI stateful, a long-running process.

The D(a)emon

In the OS world, this usually means setting up a daemon — a background process that waits for and processes events like user commands. (A popular example is MySQL, with its mysql-server daemon and mysql-client packages.)

In our case, we don't want a fully-fledged "managed" daemon, like a system service. For example, there's no point in starting our daemon before we actually start making our CLI calls; also, if our daemon dies, there's no point in starting it right away; since we cannot recover the lost state anyway.

So we have a simple plan:

  • break the CLI into a "client" and daemon
  • every time we run the CLI,
    • check for the presence of the daemon, and
    • spawn the daemon if it is not already running

This way, if the daemon dies, the next CLI invocation will auto-start it. Nothing to worry, nothing to manage.

Our Fast AWS CLI Daemon — It's All in A subprocess!

It is easy to handle the daemon spawn without having the trouble of maintaining a second program or script; simply use subprocess.Popen to launch another instance of the program, and instruct it to run the daemon's code path, rather than the client's.

Enough Talk; Show Me the Code!

Here you go:


import os
import sys
import tempfile
import psutil
import subprocess

rd = tempfile.gettempdir() + "/awsr_rd"
wr = tempfile.gettempdir() + "/awsr_wr"

def run_client():
out = open(rd, "w")
out.write(" ".join(sys.argv))

inp = open(wr, "r")
result = inp.read()


def run_daemon():
from awscli.clidriver import CLIOperationCaller, LOG, create_clidriver, HISTORY_RECORDER

def patchedInit(self, session):
self._session = session
self._client = None

def patchedInvoke(self, service_name, operation_name, parameters, parsed_globals):
if self._client is None:
LOG.debug("Creating new %s client" % service_name)
self._client = self._session.create_client(
service_name, region_name=parsed_globals.region,
client = self._client

response = self._make_client_call(
client, operation_name, parameters, parsed_globals)
self._display_response(operation_name, response, parsed_globals)
return 0

CLIOperationCaller.__init__ = patchedInit
CLIOperationCaller.invoke = patchedInvoke

driver = create_clidriver()
while True:
inp = open(rd, "r")
args = inp.read()[:-1].split(" ")[1:]

if len(args) > 0 and args[0] == "exit":

sys.stdout = open(wr, "w")
rc = driver.main(args)


if __name__ == "__main__":
if not os.access(rd, os.R_OK | os.W_OK):
if not os.access(wr, os.R_OK | os.W_OK):

# fork if awsr daemon is not already running
ps = psutil.process_iter(attrs=["cmdline"])
procs = 0
for p in ps:
cmd = p.info["cmdline"]
if len(cmd) > 1 and cmd[0].endswith("python") and cmd[1] == sys.argv[0]:
procs += 1
if procs < 2:
sys.stderr.write("Forking new awsr background process\n")
with open(os.devnull, 'r+b', 0) as DEVNULL:
# new instance will see env var, and run itself as daemon
p = subprocess.Popen(sys.argv, stdin=DEVNULL, stdout=DEVNULL, stderr=DEVNULL, close_fds=True, env={"AWSR_DAEMON": "True"})

elif os.environ.get("AWSR_DAEMON") == "True":

Yep, just 89 lines of rather primitive code — of course, it's also on GitHub, in case you were wondering.

Some Statistics, if You're Still Not Buying It

"Lies, damn lies and statistics," they say. But sometimes, statistics can do wonders when you are trying to prove a point.

As you would understand, our new REPL really shines when there are more and more individual invocations (API calls); so that's what we would compare.

Let's upload some files (via ):


for file in $(find -type f -name "*.sha1"); do
    aws s3api put-object --acl public-read --body $file --bucket target.bucket.name --key base/path/

  • Bucket region: us-east-1
  • File type: fixed-length checksums
  • File size: 40 bytes each
  • Additional: public-read ACL

Uploading 70 such files via aws s3api put-object takes:

  • 4 minutes 35 seconds
  • 473.5 KB data (319.5 KB downlink + 154 KB uplink)
  • 70 DNS lookups + SSL handshakes (one for each file)

In comparison, uploading 72 files via awsr s3api put-object takes:

  • 1 minute 28 seconds
  • 115.5 KB data (43.5 KB downlink + 72 KB uplink)
  • 1 DNS lookup + SSL handshake for the whole operation

A 320% improvement on latency (or 420%, if you consider bandwidth).

If you feel like it, watch the outputs (stdout) of the two runs — real-time. You would notice how awsr shows a low and consistent latency from the second output onwards; while the plain aws shows almost the same latency between every output pair -—apparently because almost everything gets re-initialized for each call.

If you monitor (say, "wireshark") your network interface, you will see the real deal: aws continuously makes DNS queries and SSL handshakes, while awsr just makes one every minute or so.

Counterargument #1: If your files are all in one place or directory hierarchy, you could just use aws s3 cp or aws s3 sync in one go. These will be as performant as awsr, if not more. However in my case, I wanted to pick and choose only a subset of files in the hierarchy; and there was no easy way of doing that with the aws command alone.

Counterargument #2: If you want to upload to multiple buckets, you will have to batch up the calls bucket-wise ( us-east-1 first, ap-southeast-2 next, etc.); and kill awsr after each batch — more on that later.

CloudWatch logs

Our serverless IDE Sigma generates quite a lot of CloudWatch Logs — especially when our QA battalion is testing it. To keep things tidy, I prefer to occasionally clean up these logs, via aws logs delete-log-group.


for i in $(aws logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
    echo $i
    aws logs delete-log-group --log-group-name $i


Cleaning up 172 such log groups on us-east-1, via plain aws, takes:

  • 5 minutes 44 seconds
  • 1.51 MB bandwidth (1133 KB downlink, 381 KB uplink)
  • 173 (1 + 172) DNS lookups + SSL handshakes; one for each log group, plus one for the initial listing

On the contrary, deleting 252 groups via our new REPL awsr, takes just:

  • 2 minutes 41 seconds
  • 382 KB bandwidth (177 KB downlink, 205 KB uplink)
  • 4 DNS lookups + SSL handshakes (about 1 in each 60 seconds)

This time, a 310% improvement on latency; or 580% on bandwidth.

CloudWatch metrics

I use this script to occasionally check the sizes of our S3 buckets — to track down and remove any garbage; playing the "scavenger" role:

for bucket in `awsr s3api list-buckets --query 'Buckets[*].Name' --output text`; do
    size=$(awsr cloudwatch get-metric-statistics --namespace AWS/S3 \
        --start-time $(date -d @$((($(date +%s)-86400))) +%F)T00:00:00 --end-time $(date +%F)T00:00:00 \
        --period 86400 --metric-name BucketSizeBytes \
        --dimensions Name=StorageType,Value=StandardStorage Name=BucketName,Value=$bucket \
        --statistics Average --output text --query 'Datapoints[0].Average')
    if [ $size = "None" ]; then size=0; fi
    printf "%8.3f  %s\n" $(echo $size/1048576 | bc -l) $bucket

Checking 45 buckets via aws (45+1 API calls to the same CloudWatch API endpoint), takes:

94 seconds

Checking 61 buckets (62 API calls) via awsr, takes:

44 seconds

A 288% improvement.

The Catch

There are many; more unknowns than knowns, in fact:

Bonus: Hands-on AWS CLI fast Automation Example, FTW!

I run this occasionally to clean up our AWS accounts of old logs and build data. If you are curious, replace the awsr occurrences with aws (and remove the daemon-killing magic), and witness the difference in speed!

Caution: If there are ongoing CodeBuild builds, the last step may keep on looping – possibly even indefinitely, if the build is stuck in BUILD_IN_PROGRESS status. If you run this from a fully automated context, you may need to enhance the script to handle such cases as well.

for p in araProfile meProfile podiProfile thadiProfile ; do
    for r in us-east-1 us-east-2 us-west-1 us-west-2 ca-central-1 eu-west-1 eu-west-2 eu-central-1 \
        ap-northeast-1 ap-northeast-2 ap-southeast-1 ap-southeast-2 sa-east-1 ap-south-1 ; do

        # profile and region changed, so kill any existing daemon before starting
        arg="--profile $p --region $r"
        kill $(ps -ef -C /usr/bin/python | grep -v grep | grep awsr | awk '{print $2}')
        rm /tmp/awsr_rd /tmp/awsr_wr

        # log groups
        for i in $(awsr $arg logs describe-log-groups --query 'logGroups[*].logGroupName' --output text); do
            echo $i
            awsr $arg logs delete-log-group --log-group-name $i

        # CodeBuild projects
        for i in $(awsr $arg codebuild list-projects --query 'projects[*]' --output text); do
            echo $i
            awsr $arg codebuild delete-project --name $i

        # CodeBuild builds; strangely these don't get deleted when we delete the parent project...
        while true; do
            builds=$(awsr $arg codebuild list-builds --query 'ids[*]' --output text --no-paginate)
            if [[ $builds = "" ]]; then break; fi
            awsr $arg codebuild batch-delete-builds --ids $builds


In closing: so, there it is!

Feel free to install and try out awsr; after all there's just one file, with less than a hundred lines of code!

Although I cannot make any guarantees, I'll try to eventually hunt down and fix the gaping holes and shortcomings; and any other issues that you or me come across along the way.

Over to you, soldier/beta user!

aws, cli, cloud, cloudwatch, daemon, metrics

Published at DZone with permission of Janaka Bandara , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}