Application Production Readiness Guide
Deploying an application in production is not the end state but a very crucial stage of the software development lifecycle.
Join the DZone community and get the full member experience.
Join For FreeDeploying an application in production is not the end state but a very crucial stage of the software development lifecycle. There are a lot of factors that could go wrong, especially when you’re developing a new service, deploying for the first time, or releasing a major feature update. In order to avoid any incident, this guide can help you in creating a blueprint for deploying and managing your application in production.
Production Readiness
- Architecture review: I know this is the very first step before starting the development. But most of the time, as development progresses, a lot of things change, and the architecture of documents becomes outdated. It’s always a good practice to keep your document updated and verify once before the final release.
- Security: Security is another important pillar of a healthy service. This guarantees the safety of the data stored and shared across various sub-systems. Verifying and making sure all good security practices are followed before releasing the service will help in the long run.
- Service dashboard: A set of service monitoring dashboards consolidated at one place to provide a holistic view of service health and performance. This will help in understanding various components and usage of the application.
- Alarms: There are a few standard alarms like memory, CPU, GPU, etc. (threshold at 70% or 80%), which will let your team know about the potential issue and help mitigate it proactively. This event will help you put right methodologies to scale your application as required. There are certain data points that are unknown at the time of release, and it is fine to put an arbitrary number to raise the alarm for that kind of metric.
- Scaling, caching, and latency: Every application is built for a certain set of users and a certain number of transactions to support. But oftentimes, we need to be ready to scale up/down based on usage. It’s always best practice to put proper scaling factors to upscale and downscale the application based on various parameters to avoid any downtime or customer impact.
Ensure a proper caching mechanism is implemented to cache and invalidate the cached data. This will help you maintain your SLA for low Latency. - Beta testing: Beta testing provides a developer environment to test the flow, interactions, and end-to-end working of the application. This helps in building confidence.
- Gamma testing or UAT (User Acceptance Testing): It’s always better to test your application/feature with a certain set of Braveheart users who can provide feedback based on real usage. This will also help you to test your application before opening it to a wider audience.
- Runbooks: There are scenarios where certain sets of manual operations are required in order to perform certain sets of tasks, e.g., onboarding users, migrating existing customers, etc. These should be properly documented to avoid any issues.
- UT coverage: Define minimum coverage criteria, say 80% or 90%, and follow that to write unit tests. The better the coverage, the lesser the chance of a bug.
- Integration tests: UT is a great way to capture any issue with your function or a unit of code, but an integration test will allow you to test functionality as a whole. This will also ensure any future changes should not be breaking existing functionality.
- CM (Change Management) process: Point# 8 talks about Runbooks. While Runbooks defines the steps to perform the manual steps, the Change Management process will ensure proper steps are followed and document these changes. This will establish a best practice for any manual change and avoid any issues in production.
- CI/CD Pipeline: Avoid manual intervention as much as you can. A full CI/CD pipeline will ensure the changes are properly reviewed, unit tested, and integration tested.
- Dev testing: Perform as much dev testing as you can for all the possible scenarios you can think of. This will raise the quality of deliverable and ensures the safety of the code deployed in production.
- Incident mitigation plan: However better the system is designed and developed, there will always be some known or unknown risks associated with it. It is always a good practice to document a risk mitigation plan in case any incident happens. For example, what if the web server is crashed, what if web page redirection is broken, etc? A mitigation plan will act as a guide for any developers in the team to act quickly in case of an incident. “Develop the best; prepare for the worst!”
- Dependency documentation: If your application is dependent on some other application or applications, review it, document it, and create a mitigation plan for use cases like what if that service starts throttling, what if the service is down, etc.
- Logging: Ensure proper logging is implemented for all the scenarios like info, error, warn, or debug. Also, ensure no critical data is printed in the log and proper log level is implemented in different stages.
- Load testing: Performing a load test on your application will provide you with a number when the application might break. This will help you implement a proper scaling mechanism to avoid downtime.
- Rollback mechanism: What if a faulty deployment reaches production? What if some untested code reaches production? What if the deployed code is not behaving right in production? In order to mitigate any such issues, a proper rollback mechanism should be established and documented to roll back the application to a previously known state. This will help you to quickly restore applications' previously known state.
- A/B testing: In case of some major feature update, It’s better to collect metrics based on A/B customers. These metrics will provide insight into the feature released and its adoption. This will help you make the decision to roll out to a wider audience or roll it back.
- Roll-out mechanism: Alpha, beta, gamma, UAT, and Production. These are various stages that you could set up to target different kinds of customers. But in order to release your application or a major feature to a wider audience, define a proper rollout strategy like rolling out to a certain geographic user base, rolling out to a certain percentage of users, rolling out to a certain number of users, etc. Once the initial rollout is a success, then how to roll out to a wider audience.
Conclusion
In conclusion, having detailed documentation capturing various data points not only navigates a safer development but also provides safer deployment and healthy service.
Published at DZone with permission of Pranav Kumar Chaudhary. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments