For a software product, identifying a product/market fit, getting to market, and then scaling to meet business demands are all things that need to happen quickly. Time is still money, and we’re now thinking in terms of dozens of deploys per day that need to happen.
As I’ve mentioned frequently, optimizing your time and staying focused on your core product is essential for your business and your customers. You can check out my recent posts on team productivity and cloud services for more specifics about that.
With this increased need for moving fast, many teams started using the cloud to get large infrastructure off the ground quickly. But those cloud services — especially the latest iteration of services, like AWS Lambda or Elastic Containers Service built on top of the cloud — can lead to a level of lock-in that many teams shy away from.
What Lock-in Looks Like
The biggest concern I’ve heard when discussing this topic was potential lock-in to a specific provider. I see three different scenarios for lock-ins in this kind of infrastructure.
Moving Infrastructure Takes Effort
When you change providers, there is always work involved that locks you into the current provider. Even with an infrastructure built on standard tools and frameworks, you’ll have to go through transferring your data, changing DNS, and testing the new setup extensively. Better tooling can help make this easier, but it doesn’t remove the pain entirely.
At Codeship, we’ve had great success so far with Packer as the main driver for our build infrastructure. Additionally, we try to stay as close to the standard of any tool we use so as not to get too bogged down in the details of a specific service.
Code Level Lock-in
Code level lock-in is present when the hosting service requires you to code against their proprietary APIs and build on top of their proprietary platform.
Google App Engine is an example of deep code-level lock-in. It requires you to build your application in a very specific way tailored to their system. This can give you major advantages because it’s very tightly integrated into the infrastructure, but you’re also completely tied to that platform.
Heroku for example has gone the opposite way by building a platform that uses the standard frameworks (e.g., Rails, Django, etc.) that people want to build with. It’s not as optimized and automated as Google App Engine is, but it gives their customers the sense of being able to move away from Heroku at any time.
For many teams, a deep lock-in is too risky. Any decision made by the main infrastructure provider could severely impact their company, perhaps negatively, without any way to move off a provider quickly. While moving off a specific provider is a lot of work, the option needs to at least be there.
“Code-level lock-in is risky for many teams as dependence on 1 provider grows.”
Architectural lock-in happens when the specifics of a service provider force you into an architectural style that is not or is minimally supported by other providers. Architectural lock-in can happen even though there might not be any or little code-level lock-in.
An example of a service that has minimal code lock-in but major architectural lock-in is AWS Lambda. In the first iteration of Lambda, you write Node.js functions and invoke those functions through either the API or they get invoked on specific events in S3, Kinesis, or DynamoDB.
For any sufficiently complex infrastructure, this could lead to dozens or hundreds of very small functions that aren’t complex by themselves or have major lock-ins on a code level. But you can’t take those node functions and run them on another server or hosting provider. You would need to build your own system around them, which means high architectural lock-in.
On the plus side, there’s a lot of infrastructure we simply don’t have to deal with anymore. Events are fired somewhere in your infrastructure, and your functions will be executed and scaled automatically.
Heroku, AWS, and other cloud providers have seen that writing on the wall and are decreasing code-level lock-in, while providing new services that create architectural lock-in.
Evaluate Lock-in Scenarios for Your Team
The first lock-in category — moving infrastructure takes effort — is hard to avoid but can be shrunk to manageable size.
You should definitely try to introduce tools into your platform that make switching providers at least possible. Tools like Packer and workflows like regularly backing up and restoring your data to a different provider are some of the safeguards that can help you make sure you’re able to move off of your main hosting provider. These safeguards might seem excessive for small teams, but once you move to the medium-sized engineering teams, they’re a necessity.
The acceptable tradeoffs for code-level and architectural lock-in are more difficult to define.
“Your team needs to discuss which lock-in scenario is acceptable.”
I prefer architectural lock-in to code-level. Architectural lock-in can be more easily circumvented by building small services based on open-source technologies. Even though you might want to keep some of the services with a hosting provider, you can move others off of that provider. This gives you the ability to decide which part of your infrastructure fits best to which provider.
These providers have to be based on open-source technology though. Otherwise you’re stuck with code-level and architectural lock-in.
So start by evaluating how much code-level lock-in a specific provider forces you into. Is it easy to abstract the providers’ APIs so you can easily switch to another provider in the future on a code level?
Next, analyse whether you’re locked into building in a specific way due to architectural constraints. Can you reduce the architectural lock-in by using simpler services that work together? What seems like the hardest part to move off of a specific provider in the future?
It’s up to every team to decide which of those lock-in scenarios are a good tradeoff. A microservice-oriented architecture that’s built on technology you can use on a variety of providers can offset some of that tradeoff (e.g., frameworks like Rails or Node). You can build on top of the services in the beginning and move parts of your infrastructure somewhere else for more control in the future.
Let us know about your strategies and experiences with lock-in in your infrastructure in the comments.