A lot of enterprises are getting great value out of migrating their applications to the Cloud.
In particular AWS is the "Big Dog" in this space - although Microsoft, Google and IBM are out there I personally believe they are far behind (by approx 2-3 years in terms of pure depth and breadth of functionality) and that shows up in the revenue too.
AWS customers are saving money mostly because you can rent what you need vs. buy for the peak traffic. They are also gaining not only increased scalability (e.g. auto-scaling) but increased flexibility (e.g. EC2 sizing options, deployment regions), reduced response times and reduced development times (from pre-built, pre-configured components like SQS, SNS, SES etc.).
However anyone who has spent any time in Software or IT shivers at the word "migration" and add to that the relative unknown of migrating to a data center and IT staff you rent and don't own / control - that can be fear inducing for the uninitiated.
Having recently completed a "Big Bang" migration to AWS I learned a lot of lessons that I would like to share and have also learned a lot about potential approaches to migration. This article describes a few approaches.
First off - What are we migrating?
For the purposes of this article I am going to assume you have a typical three tier Web/Mobile application - Web/Client Tier, Application Tier and Data Tier. For Java that could be Struts/JSP web tier & EJB in a J2EE container (Web & Application tiers) or alternatively a mobile app with REST services. Either one backed by a traditional RDBMS (e.g. Oracle / MySQL). If you are thinking of going the NoSQL route at the same time I would suggest you do the AWS migration first, because once you go to AWS your setup, profiling and tuning of your NoSQL implementation may need to be done again.
|A typical "in house" 3-tier web architecture|
STEP 1: Learn, learn and learn
Your first step is always to learn as much about AWS as possible and learn about the architectural and design options available. You have EC2 (your virtual machine instances), SQS (for Queueing), SNS (for Notifications), ELB (Elastic Load balancer for load balancing), Route 53 (for DNS) and RDS (Relational data services for your RDBMS), SES (Simple Email Service). There are plenty more services out there but for the sake of this (relatively) basic 3-tier architecture those key services will get you a long ways. Giving your tech staff an intensive training course (delivered by Amazon) on site is a great idea to get everyone to the same knowledge level FAST!
STEP 2: Pick and choose your services
Second step: your architecture and security teams are needed next to make sure your target architecture handles Disaster Recovery / High Availability and Security. It's all pretty much the same rules as you had before just wrapped a bit differently (e.g. failover detection in your load balancer, firewall rules of what ports are allowed or not) but also you've got the added "features" and flexibility (complexity!) of Availability Zones and Regions. Realize that AWS is always in motion - adding features, tweaking etc. so you'll need to be comfortable a bit with learn as you go. On the security side, I don't think anything beats having a "Red Team" whose goal in life is to hack your system. So start simple before the migration and add new cool services later post-switchover. Don't try to migrate AND add a lot of new services at the same time.
Finally once you feel you know enough about AWS and have a target architecture comes the fun part. Migrating all the individual pieces. There's really two basic ways to do it - all at once or bit-by-bit.
PATTERN #1: "Big Bang" switchover.
One way to do a migration is as follows - the "Big Bang"
1) Migrate your Domain e.g. www.mycompany.com DNS records to be managed by Route53 - still having the records point to your old data center
2) Build out and test your new target architecture in parallel. Migrating all data as required. Test some more! And Test some more!
3) On some day at some hour cut over the www.mycompany.com DNS records to point to your AWS load balancer. Smoke Test, have some users test. And either it's good (and you're OK!) or it's not (and you fail back).
|Parallel Architectures before the "Big Bang" switchover|
The Pros are this is relatively simple sequencing for management and for developers - very waterfall and it's relatively simple to test but on the downside it's risky - as all "big bangs" are.
The risks and problem become clear if you have a 24x7x365 mission critical system - especially one where you can't just tell your users to log off at a certain time. Similarly a high-visibility or large revenue generating system (even if it's not 24x7) might not be a candidate for a big bang approach - since you may find some major migration issues hours, days or weeks after switchover without the ability to switch back (easily)
So what's the alternative to "Big Bang" - well clearly it's piecemeal. You could migrate some small components of your architecture one-by-one. Or you could operate two parallel architectures in parallel with some data synchronization
I don't recommend the latter - in my experience Data synchronization systems are some of the hardest to get right all the time - especially when networks are so flaky.
So what does that leave us?
PATTERN #2: Do it in steps via a Hybrid architecture
1) Migrate small well-defined components first. A good example of this is, if you are using JMS, switch to SQS. If you send emails switch to SES. That means all of the rest of your application (Web Tier, App Tier, Data Tier) remains for the moment as is - but you are calling these services remotely. This is a good first foray - it gets your Ops and Security team used to IAM roles and you will learn things about Regions and Availability Zones without going all in. Even these small changes might require some architecture rethink as, even within AWS data centers, calls to SQS are NOT fast like say ActiveMQ is locally (mostly because SQS is backed by three persistent copies of the data being stored in S3).
This is a nice piece of work where you'll learn a lot without mortgaging the farm.
In addition you'll be learning more about pricing, tiers and your REAL billing (which can sometimes come as a surprise!).
2) From there you have to get some of the rest of your Architecture over.
One good option exists if you have some API calls for which you don't have a tight SLA (e.g. you don't mind slipping from a 50 ms response time to 1 second) or you don't mind the data being a bit stale (say by minutes or hours). In that case you might want the following:
i) Route 53 migration of your DNS records as before
ii) Set up either MySQL replication from your data center to a slave on AWS or perhaps a basic nightly dump-and-load from your Production Oracle
ii) Using Route 53 to route some large percentage of the requests to your "Real" system (back in your old data center). This is done via Weighted Round Robin in Route 53.
iii) Route the remainder of your requests to AWS. If the request is read-only - hit the local (read-only) RDS instance. Otherwise proxy it back to your old data center. You could do a direct proxying back or you could set up an SQS queue to do the writes asynchronously to help avoid a very expensive (remote) write hit.
|Hybrid architecture to lower switchover risk|
Here again you will have taken your AWS understanding to the next level. If you don't like what you see in Production on Day 1 or Day 20 you can change the Route 53 to set the weight of AWS to zero. But if it works you will have learned a lot about
- IAM roles
- EC2 and Security
- Deploying your app onto EC2 (using Chef, Puppet etc.)
- RDS & data migration
- ELB to load balance to EC2 instances locally
- Response time variability and related issues.
- Route 53 etc.
Naturally your Security and DBA folks will want close involvement to make sure your data replication is secure and is not opening up any unnecessary holes to the outside world. Your architecture folks will need to keep an eye (with Ops) on latencies and monitoring.
The nice thing about being here is you will have done a LOT of your learning and change-making without HAVING to take the switchover risk immediately.
Get your surprises before you go "all in"!
Also at this stage you'll learn a lot about three areas of "surprise" in AWS (at least they were a surprise to me!)
1) Billing - it's not what you think it is! You're usage is often very different than your original cost estimation (Hint: It's mostly EC2 + Database)
2) Noisy Neighbors - everything is hunky dory until it's not because your neighbor either is hogging the physical CPU
3) IOPS - Related to Noisy Neighbors but with I/O. You don't have full control of this Data center. You might find your response time needs some tweaking or you need to buy more dedicated IOPS.
Also you can run this way for a while all the time letting the architecture bake-in perhaps moving more and more traffic to your AWS infrastructure. You don't want to maintain parallel architectures (and code paths!) for too long. Eventually you come to a tough choice:
1) Have some writes from your AWS architecture write to the remote data tier (in the old data center) to continue the gradual change
2) Switchover the Data Tier to AWS (but still have some remote writes). At this stage you might choose to just finalize the switchover and go "all in" rather than take more of the remote data tier performance hits.
In an ideal world, if you are using MySQL Master-Slave replication you will have an easier time of completing the switchover without too much "shenanigans" (that's a technical term!). Alternatively you might choose to make you application a little less chatty with the data tier (a good thing in general) so that the remote writes don't seem so bad - and now you can gradually ramp up your Weighted Round Robin to move things over bit-by-bit until the day you promote RDS to the master.
Either way by the end of the process you'll have one final "switchover" and be done - you can switch off your old machines and start to enjoy the cost benefits and all the flexibility of "on demand computing".
I'd like to hear if other people have other suggestions on AWS migration strategies that have been proven to work.
p.s. One extra bonus - once you've got the "ramp-up and migration" process down, you can use the same process to stand-up more instances of your architecture in different regions. Sadly for now, you can't have RDS create a read-replica in a different region but you CAN look into putting in place a scheme for putting local writes on an SQS queue / SNS topic and persisting it remotely to give yourself a "roll your own" data replication methodology.
[Edit: I just found out that AWS RDS *does* support cross-region replication link]