This post revisits my earlier evaluation of Google App Engine's post-preview pricing changes and how it affected my project, SMSMyBus. As I noted, the app was projected to cost between $6 and $7 dollars per day under the new platform pricing.
This is authored by Greg Tracy, Co-founder of Asthmapolis and Sharendipity
Since that post, I’ve been rolling out small, incremental changes to optimize the code and combat all of the known issues. I’m thrilled to report that I have the price down to $0/day. And I’m once again impressed by the snappy and reliable App Engine platform.
Looking back on the changes, I can say that I was doing some bad things, some abusive things, and App Engine was making some bad choices as well. But the end result proves that if its developers optimize and do smart things, they are rewarded. App Engine remains a great solution for my transit API service.
Here’s a history of the changes over the last two months that got me down to $0/day...
1. Instance allocation
The new pricing model charges applications based on their use of instances (hardware resources where your application is running) rather than CPU utilization. A key to keeping your instance cost down is to simply reduce the number of instances that are spinning. Duh. So I grabbed the instance slider in the application settings and yanked it to the left. This doesn't prevent scaling, it just limits my billing for normal traffic flow.
2. Delete data
App Engine data storage (for your database) costs $0.008/GByte-day. Doesn’t sound too expensive, but I had been storing every single API call I had ever gotten. I thought it would be useful for API developers and for analytics. My drive to $0 outweighed that, however, so I deleted all of the history data and got under the free quota for storage.
3. Memcached the application's route listings
I was surprised to find that I wasn’t doing this already, but there it was. I have a data structure that maps bus routes and bus stops to scheduling data on the Metro website and it never changes. In some cases - like the static calls from the kiosk clients - I was looking up route listing details in the datastore once every minute!! Fail. I used memcache to keep the common queries in memory and avoid the extra datastore reads.
4. Limit access during off hours
One thing that never changes is when the Metro service is running. There are five+ hours a day where the buses aren’t on the street. But some clients are still asking for data. I stubbed out most of the API during these off hours before the code ever gets close to making a datastore or memcache call.
These four changes brought me down to $0.70 per day. Bam!
5. Asynchronous screen grabs
If you don’t know, behind the API curtain is an ugly screen scraping task that extracts the arrival estimates from the Metro website. So when a client requests arrival data for a stop, the app goes off and requests multiple web pages, machine-reads the information and aggregates all of the results.
The original implementation of the SMS interface did this by creating multiple tasks (one for each route traveling through the respective stop). When a task ran, it stored the results in the datastore. An aggregator task would read those results out of the datastore and piece together the response to the caller.
When the API was created, I couldn’t use background tasks because I had to respond with results in the same HTTP context. That’s when I discovered the great feature, asynchronous url fetch. This essentially let me grab all of the different Metro web pages at the same time. But when I implemented this, I continued to use the datastore as the mechanism for storing and retrieving results. This was just lazy. Under the old pricing, I wasn’t incented to change it other then the fact that it was a bit slow.
Under the new pricing model, this solution was very expensive. The API is continuously running this aggregation algorithm - constantly writing and reading to the datastore for model instances that have a lifespan of under a minute!
I rolled out a change that removed the use of the datastore and instead sorted the aggregated results in memory. This had a dramatic effect on my API quota for datastore reads and writes. Especially the write operations, where you get penalized by an order of magnitude for this type of behavior because index updates work against your API quota as well.
After optimizing the API, I realized that the original SMSMyBus apps (SMS, chat, email and phone interfaces for the Metro) were now the long pole. Those apps were implemented before the API existed so they weren’t benefiting from the API optimizations. Solution... re-implement to use the SMSMyBus API.
It should have been done long ago simply as a validation exercise of the API methods. Credit to the eligence and simplicity of the API - this port was simple and only took a couple of hours.
These two changes brought me down to $0.10/day. Badda-bing.
7. Run Appstats on all application interfaces
The last stop on the optimization train was Appstats. A truly great tool in the App Engine toolbox. In just a matter of minutes, you can find the hidden datastore operations that are dragging you down. In my case, it led me to one area that wasn’t being memcached at all. And it revealed an area that was simply using the memcache incorrectly! Love this tool...
This change brought me down $0.00/day. Winning!
App Engine remains a great platform for developers that don’t abuse it and take the time to optimize their applications.
The SMSMyBus API now serves over 6,000 transit requests per day. It’s fast, reliable and flat out fun to use. I’m as proud as ever that I brought this to Madison.
Next step... find a way to fund my SMS users. :)