As an engineer at Lyft, Leventi explained to his audience that Docker enables them to create a standardized development environment, including all necessary service dependencies. It allows developers to be much more efficient, and it reduces onboarding/ramp-up time.
The Goal: Increase Productivity
Engineering at Lyft means fluid teams and a rapidly growing headcount. Everyone, Leventi stated, does devops. It’s an operation that involves 50+ microservices and 25 server deploys per day.
Lyft has the ambitious goal of enabling brand-new developers to ship code to production on Day One of employment. It’s a particularly strong measure of productivity designed to achieve faster feature development.
Leventi pointed out that common hindrances to developer productivity include:
- problems with VPN
- a build failed, for reasons unknown
- a deploy to production that may break the world
Lyft’s goal is to remove those types of blockers and cut down on common productivity complaints like:
- “It doesn’t work on my box.”
- “I don’t understand how the client go into that state.”
- “It worked in development!”
- “How do I get service x to talk to service y?”
- “How do I test this feature from the client?”
- “How do I get started working on a new team?”
In April 2014, Lyft chose to invest in standardizing and improving developer work environments. It was a decided switch from their methods before last year: Each dev had individual installs of many individual services; it was a manual task to stay up to date on changes; and it was expensive to orchestrate, just to name a few bottlenecks.
Now, Leventi said, every developer at Lyft runs the standard environment Devbox. Everyone has the same up-to-date local environment, and Docker containers have all the resources for a live, running environment to test new code.
VMware Fusion runs Docker on MacOS, and Vagrant handles Lyft’s virtualization configuration. Services are started via command line (./service start api), and Devbox has the capability of snapshotting an environment for troubleshooting or analysis or comparison.
Lyft also uses “Onebox,” a standard test/integration environment that can be easily deployed to a cloud environment. That means, Leventi emphasized, all of Lyft, in the cloud, running any combination of builds, on a single EC2 instance. It’s constantly up to date, and every QA engineer has their own environment.
An open-source continuous integration service runs all integration tests between and inside services.
Leventi described Lyft’s service model in detail:
- single fat containers
- fixed static IP address model
- single stateful local container
- auto detect code changes
Each Docker image is a file system snapshot of config management. Building one consists of:
- a git clone of a central ops codebase
- a git clone of a service codebase
- a SaltStack provisioning run
- runit configuration for processes
To run a service image, a Lyft dev follows this workflow:
- Reruns salt provision on new SHAs
- Starts runit processes
- Terminates the container if initial runit checks fail
As a result, devs can easily apply ops modifications; testing PRs are a matter of changing environment variables; and devs don’t need to wait for an image build, as deltas are applied during runs.
Each environment has its own single host: DevBox has a Mac Docker host using VMware Fusion with shared folders; the CI slave has an AWS Ubuntu Docker host for short-lived containers; and OneBox has an AWS Ubuntu Docker host for long-lived environments.
To manage state, all stateful processes run inside the same container. For Lyft, that includes:
- SQS Local
- Fake Kineses
Leventi demoed the process for his audience with a small, sample Python web app, showing that code can be modified live and reflected without having to reload a Docker container.
As Leventi mentioned early on in his talk, a big measure of Lyft’s productivity success is whether or not a brand-new dev can push code to production on Day One of employment. And, he said, a majority do. Feature devs are no longer blocked by devops, and QA client testing is parallelized with separate but identical environments.
Of course, what is productivity without stability? Leventi stated that 99 percent of Lyft’s deploys are successful, and every pull request on every service is integration tested.
There are always lessons to learn as a company pushes the envelope on productivity. Leventi mentioned a few hurdles in particular:
- VMWare Fusion can be unstable under load
- Frequent image downloads take time, and devs need to plan downloads
- Bugs in config management can freeze development
- Easy service creation leads to unnecessary services
- It’s easy to approach limits on what can run on a single box
- Static IP allocation isn’t supported in Docker
Lyft is currently exploring Docker usage for production, Leventi said, with ETL jobs in Docker. They’re also experimenting with containers to reduce auto-scale group spin up/down times and containers for atomic deploys, as well as using Docker for on-time actions.
Considering Lyft’s primary goal is to accelerate productivity, it seems they’ve effectively utilized Docker to achieve a standardized environment ready to create more efficient devs.