Effective Reporting for Cloud Well-Architectured Frameworks
Follow a personal experience to obtain effective reporting for a well-architected assessment and utilize a template/guide to leverage for similar situations.
Join the DZone community and get the full member experience.Join For Free
“It sucks…We can’t show this to the customer,” was my colleague's reaction to my first report draft.
I performed a well-architected assessment for a customer, created a report filled with shortcomings, vulnerabilities, and technical how-to fixes. Then, I shared it with my non-developer colleagues for a peer review. They were baffled, and after a few minutes of back and forth debates, one of them politely conveyed to me that it sucks and that we could not show it to the customer.
In retrospect, they were right: As a developer, my report was merely a long list of technical issues full of jargon that only cloud engineers could make sense of. However, we had to present the report to the CIO and the executive committee for funding and approval. Therefore, with help of my colleagues and through a few iterations, I created an easy-to-understand report for a different type of audience. Later on, the customer approved it and they embarked on an improvisation process. Based on that experience, I am going to provide you with a template/guide so you can leverage it for similar situations. My customer was using Azure but this guide is platform agnostic.
Before jumping into the guide, here are some general tips about a well-architected assessment:
- Currently, all major cloud providers have their own framework to perform cloud assessment, and they keep evolving (i.e., Google Cloud Architecture Framework, AWS Well-Architected Tool, Microsoft Azure Well-Architected Framework). Use the framework of your cloud provider, as it's fit for your purpose and can empower you with granular assessment/solution/reporting depending on services that you use. However, do not limit yourself only to that, and try to leverage tools from other providers. In my case, the customer used Azure so I used Azure’s tool for the service-specific part of the report; meanwhile leveraged AWS’s as it provided (at least at the time) a better perspective on the overall aspect of cloud operation.
- Cloud assessment frameworks typically consist of five major pillars: security, cost optimization, operational excellence, reliability, and performance efficiency. To prevent any future disappointment, before starting the assessment, ask your customer to prioritize the pillars according to their needs. Make it clear that you must prioritize your efforts accordingly, and that improvisation of cloud environment is an ongoing effort: it can get better, but always will be imperfect.
- Don’t blindly trust what the customer/development team answers to framework questionnaires. It often happens that they underestimate the severity of issues/overestimate their methodology approach. Go through their answers together with them, and review their answers.
- Make sure to avoid personal attacks/finger pointing at any cost throughout the whole process. Clarify and ensure this to all the respondents, and ask them to be as honest as possible (as it is the only way to diagnose root causes and ultimately to fix them).
- Whenever your proposed solution/service costs additional money, clearly mention it in your report. Every time.
Here is the format we used. Within each section, try to answer the listed questions (there are sample answers for a guide). You can customize the whole thing according to your needs.
1. Executive Summary
This section has to be understandable by all types of audiences, as well as give a 50,000-foot view of the whole thing.
- What is this report about?
- What did you assess? (Pillars, etc.)
- How did you do the assessment? (Workshops, hands-on investigation, etc.)
- Why the customer is using the Cloud? What are the key takeaways? In our case, design, security, operational control, and governance were the main points/issues; therefore, I elaborated on them. In a nutshell, most of the issues were because of a lack of centralized monitoring, automation, and control. However, remember to mention positive aspects, too. It cannot be all bad. Do not be like me that initially just bombarded the report with only bad things. Customers should not feel dumb.
- What is your solution? How are you trying to help the customer to overcome those issues?
- What are the milestones to improve the situation? Briefly summarize what each milestone will achieve.
- How did you do the assessment? Workshops, hands-on investigation, etc.
- Which assumptions did you receive from the customer?
- What is the priority of the pillars?
- Which workloads are covered? Do you have to deal with mission-critical workloads? If yes, what are SLAs, etc.? Is there any recovery objective defined?
- List any other assumptions that readers have to be aware of.
Summary of Findings
This section gives a fast, yet insightful view of the situation. Keep it understandable for a non-technical audience; avoid using jargon. Keep this section concise; it should be readable within 1-2 minutes.
It cannot be all bad. List some positive aspects of the environment/team/operation/strategy.
Yes, this title is much more diplomatic than the one, which I originally used.
This section outlines more details about the most important issues (from the highest priority pillars) along with recommendations. For more clarity, associate each issue with a severity level, preferably with some color. Briefly, advise the customer on how to approach the issues. In our case, “critical” issues required immediate attention; otherwise, they could result in data breach/security incidents. Here is an example:
Provide a corresponding effort-reward matrix for the aforementioned issues. You may want to advise the customer to address issues from button-right as it can provide quick wins.
Finding Summary by Pillar
This section targets a technical audience. Create a subsection for each of the pillars, and list all the discovered points (negative and positive). Be specific. Here are some examples:
- Does not have a fully automated integration and deployment setup
- Has a limited process or mechanism to mitigate deployment risks
- Has codified and automated a few components/operations
- Uses Git for version control but does not follow any of established git workflows
- Does not have any process or mechanism to audit, manage and minimize permissions
- Relies on a centralized identity provider, which makes it easier to audit and to secure user authentication.
Remediation by Pillar
This section targets a technical audience. List action points to address major issues from each pillar. Be specific. Here are a few examples:
- Operational Excellence
- Currently, resources have tags, but there is no mechanism to ensure consistency across all resources. [Customer] can achieve that by creating policies in Azure Policy (more info).
- Enable vulnerability assessment. Azure has an integrated tool that can help [customer] do that (more info). Please note that this can result in additional costs. [customer] can get updated cost info here.
- Enable Azure Endpoint Protection to help against malware (more info).
- System-assigned identity management allows cloud services to authenticate without the need for storing credentials. Go to the Identity section and enable it.
3. Effort and Reward
Make it clear that improvement is a matter of trade-off. One cannot make everything perfect, though gradual improvement (aligned with one’s interests) is the most effective way to start improving. Then, provide a high-level reward-effort matrix for the current state, consistent with sub-sections addressed in the previous section. Here is an example:
How do you recommend the customer to address the issues? Here are a few recommendations:
- Start with bottom-right issues as they provide quick wins.
- Centralized governance and monitoring are prerequisites for all the pillars. To achieve those, the customer can leverage built-in tools, at the least. Do not forget to mention that such services may cost additional money.
- Recommend the customer to take a data-driven approach. Start it by leveraging cloud providers’ built-in tools, reviewing their reports regularly, and optimizing the environment accordingly.
4. Proposed Roadmap
Recommend the customer to fix critical issues ASAP. Then propose milestones, each comprised of action points. Here are some examples:
- Configure built-in logging and monitoring tools in Azure components.
- Configure and leverage Azure central governance tools, review them at least weekly, and tune the architecture accordingly.
- Address suspicious authentications/authorizations in AD review.
- Automate security remedial actions with help of Azure Policy and other tools mentioned in the Central Governance section.
- Setup and follow a suitable version control working model.
- Codify all the remained applications and configurations into ARM templates.
- Setup DevOps pipeline and avoid any kind of manual deployment.
- Ensure consistent tagging throughout resources.
Outline dependencies between proposed actions from the roadmap. Following is an example. The figure is only for illustration helping the customer to understand the relationship between the activities.
Add any missing detail or notes.
Opinions expressed by DZone contributors are their own.