Flexera’s State of Cloud reports, for the sixth year in a row, “optimizing the existing use of the cloud” is the top initiative among surveyed organizations. Yet, I couldn’t find much written on cloud cost management from a holistic way.
However, cloud cost management (or cloud financial management or FinOps) is a socio-technical ensemble, that requires analysis from a technical and social angle to address it holistically and with a lasting impact.
In this article, I attempt to do a socio-technical analysis of typical organizational problems underlying cloud cost over-runs. This analysis intends to assist you (CXOs, business/product owners, architects, and developers), as the stakeholders of cloud cost management, to better align your actions with each other and achieve a balance between both technical and social dimensions.
What Is a Socio-Technical System?
Any organization, or a part of it, is a socio technical system because any organization employs
people with a certain skillset, who work to achieve set goals, following laid down processes, using particular technology, operating on a foundational infrastructure, and sharing certain cultural norms.
Any socio-technical system has three social dimensions along with three technical ones, namely:
With this brief explanation on socio-technical system, let me move on to cloud cost management via each of those six lenses.
Cloud Cost Management—via Social and Technical Lenses
Infrastructure: Foundational Blocks
The infrastructure aspect of the socio-technical system is all about the foundational blocks that need to be in place to manage costs in an efficient way. For example, if there is no proper classification of cost corresponding to each department/portfolio, then allocating a cost budget for each department would make no sense.
Some typical problems pertaining to infrastructure are:
Organization: aligned account structure and cost-center tagged resources lay the foundation for managing the cloud spend.
Lack of account structure aligning to organizational departments.
Lack of tagging strategy that identifies any AWS resource to its cost center.
Finance, business, operations, and application teams operating in silos.
Technology: Tools To Achieve
“Without ‘extra’ skills to handle change, specialization will precede extinction.” — The Social Design of Technical Systems by Brian Whitworth and Adnan Ahmad
The technology dimension of the system refers to the availability of tools or mechanisms and deficiencies in those tools to monitor the cost and optimize it.
Typical technology problems are lack of (usage of) tools to:
Explore and monitor costs.
Budget and report costs.
Show plans to optimize rates and opportunities to optimize usage.
Identify and eliminate resource waste.
Identify cost efficient alternatives.
Most of the services offered by cloud vendors will come under this dimension as they serve as a tool to achieve your cost objective.
Process: Setup to Engage
“Let the process match its objective—democratic results need democratic means.” — The Social Design of Technical Systems
The process aspect focuses on the processes and procedures that are in place to routinely engage people to track their cloud spend and act on it.
Typical process problems with respect to cloud cost management are lack of process to:
Identify resource waste and find optimization opportunity.
Report cloud-spend information to all stakeholders.
Measure the quality of cloud consumption.
Collaborate with the IT team, business, and finance.
Cloud vendors offer many services that automate most of these processes.
People: Skills and Collaboration
“Let those with the problem change the system, not absent managers.”— The Social Design of Technical Systems
The people aspect focuses on the skillset and collaboration among the people on the cloud and its cost management offerings to manage and improve the value of cloud consumption. For example, knowledge of Graviton2 instances offered by AWS can help in choosing those instances for your RDS DB (open source) instances, enabling a significant price-performance improvement
Typical people related problems in the context of cloud cost management are lack of:
Knowledge/training on the cloud (savings plan, cost efficient instance types, etc.,).
FinOps (cloud cost management) skill—a CoE team.
Operating rhythm between application, operations, business, and finance teams.
Cloud vendors offer services that enables upskilling the architects, developers, and leaders on the cloud and its cost management.
Culture: Ways of Working
“Give information first to those it affects.”— The Social Design of Technical Systems
Culture refers to sharing of information among and within teams, sharing the responsibilities, handling failures, and collaborating across teams.
In this context, sharing cloud spend information, sharing cost-savings responsibilities, handling failures of cost-optimization attempts, and rewarding collaboration between finance, business, application, and operations teams will play a big role in managing cloud spend.
Cultural problems are:
Tolerance of poor standards—lack of standards/compliance checks.
Lack of transparency—cloud cost information not cascaded to all levels.
Culture of top-down managerial concern—cost is not part of the application level non-functional requirement.
To improve the culture of your organization for better management of cloud spend, you can employ the following ways:
Customized granular bills need to be sent to respective business units regularly. Thus, increasing the cost visibility to all teams so as to act on it.
Set up a center of excellence team for cloud cost management to establish standards and assist application and operations teams to achieve those standards.
Introduce cost fitness as a non-functional requirement in application architecture.
Goals: Targets To Achieve
“Give employees clear goals but let them decide how to achieve them.”— The Social Design of Technical Systems
The goals aspect of the system refers to the definition and design of necessary targets for the team to pursue to manage costs and improve the cloud consumption value. For example, architectural cost targets, business portfolio level cost budgets, and compliance dashboards.
Typical problems seen around this dimension are:
Lack of focus on cost related targets.
Organization standards (such as cost budget for an application) derived without the participation of respective teams.
Failure to separate what-is-necessary from what-is-desirable.
No consideration on the quality of cloud consumption.
To improve in this dimension, you need to work on the following:
Quality of cloud consumption can be measured only if unit metrics are calculated. For example, for a social media site, cloud spend per engaged user is a unit metric. This metric helps in understanding whether the cloud spend is in proportion to the growth of user traffic. Unit metrics can be business oriented (such as $/journey) or engineering oriented (such as $/GB stored). While business-oriented unit metrics provide a top-down view for financial reporting, engineering-oriented unit metrics helps understand cost spikes and better forecast costs.
Budgets need to be defined for each application in consultation with respective teams and should be optimized from there on. Budgets
can be defined on cost, utilization, RI utilization/coverage, and savings plan utilization/coverage. Dashboard of cost metrics can shorten the feedback loop for application teams to work on cost as a non-functional requirement.
Looking at each dimension provides a different perspective of the cloud cost management problems, addressing which will give lasting impact.
Apart from the cloud vendor’s native services for cloud cost management, many third-party tools are available in the market to compliment them which you can leverage if need be.
“Numbers, the sizes of flows, are dead last on my list of powerful interventions. Diddling with the details, arranging the deck chairs on the Titanic” — Donella Meadows,
Thinking in Systems Conclusion
With this, you would have gotten a fair idea on what kind of problems do occur in an organization while pursuing to manage cloud spend. Now, “Where do I start?” “What should be prioritized?” and “Where to intervene in this socio-technical system?” are the questions that you might have in your mind.
As you know, controlling the numbers is not the mature way to address any problem but attack the causes of those problems. To find answers to these questions, let’s turn towards
systems thinking in the next article.
Lastly, in the comments section, tell me about any other cost management problems that I missed out and drop a like if you find this article useful.