Accelerate Innovation by Shifting Left FinOps: Part 6

In this final installment, we will discuss a worked-out example of how to apply cost optimization FinOps techniques early in the solution lifecycle.

Sreekanth Iyer

Raghu Rajalingam

Dec. 15, 23 · Analysis

Likes (1)

Comment

Save

4.3K Views

In the first two parts of this series, we understood the importance of cost models and how to create and refine cost models. In the subsequent parts, we learned how to optimize our workload components across infrastructure, applications, and data. In this final part, we will present the impact and results related to applying the cost optimization and ShiftLeft FinOps techniques for a cloud-native application. We will use the sample application or workload introduced in the initial post.

Reference (Part 1 of this series): The Workload

Candidate Architecture for the representative Cloud Native Application

This candidate workload consists of components in the multiple layers of the architecture. The cloud-native application performs data ingestion processing to enrich and analyze the data and then outputs the data along with reports and insights for a user. We will utilize a cloud-agnostic approach and break the workload and optimization techniques across infrastructure, application, and data components. Subsequent sections illustrate the gains and impact of applying these techniques across various layers.

Building and Refining the Cost Model

Reference (Part 2 of this series): Creating and refining the FinOps Cost Model

We looked at elaborating the requirement to find a balance between model complexity and the granularity of performance levels expected from the solution. The models were refined based on the requirements. The following table, for example, helps you determine your compute requirements based on the data ingestion timelines for the workload.

MONTHLY SOURCE DATA SIZE IN GIB	DATA PROCESSING TIME IN HOURS (APPROX)	NUMBER OF INSTANCES (PROCESSING HOURS/24)	COMPUTE INSTANCE TYPE	STORAGE SIZE IN GIB
>= 300	196	9	CPU: 4, RAM: 32 GB	50
200-<300	42	2	CPU: 4, RAM: 32 GB	30
100- <200	20	1	CPU: 4, RAM: 32 GB	20
50- <100	17	1	CPU: 4, RAM: 32 GB	10
20 - <50	10	1	CPU: 4, RAM: 32 GB	5
5 - <20	6	1	CPU: 4, RAM: 32 GB	2
< 5	2	1	CPU: 4, RAM: 32 GB	0.5

Similarly, based on the monthly source data and the response time needed for the queries, the databases are sized.

QUERY TYPE	ELAPSED TIME	SLOTS CONSUMED	SLOT TIME CONSUMED
Query for line items	19s	81.95	25 min 57 sec
Query for aggregated records	15s	90.47	22min 37 sec

A workload planning capability, as provided by Cloudability, is recommended to create the cost model as well as track, monitor, and analyze your spending through development, deployment, and operations.

Cost Optimization Techniques for Infrastructure

Reference (Part 3 of this series): Cost optimization techniques for infrastructure

The table below lists the potential optimization and techniques for the infrastructure layer applied to the example workload.

#	Optimisation Objective	Area To Optimise	Approx Cost Saving	Techniques / CONSIDERATIONS
1	Implement a controller to scale in/out the ec2 instances (required to run the application) dynamically	Compute	~10% of compute cost	There is an additional cost incurred to maintain the job statistics, but the overall compute cost is reduced. **Note: For a single data domain, this optimization may look costly because of the additional cost explained above, but if the workload is huge, then it will be cost-beneficial.
2	Split the data processing jobs to run the data processing part with spot instances	Compute	~60% of compute cost	Spot instances can bring lots of savings, but splitting the jobs should not overflow the number of jobs.
3	Implement S3 and Cloud storage clean-up policy	Storage	~50% of storage cost	After certain period of time (based on the business requirement) add data archival rule is cost beneficial for longer run of the application.

Optimisation Objective

Area To Optimise

Approx Cost Saving

Techniques / CONSIDERATIONS

Implement a controller to scale in/out the ec2 instances (required to run the application) dynamically

Compute

~10% of compute cost

There is an additional cost incurred to maintain the job statistics, but the overall compute cost is reduced.

**Note: For a single data domain, this optimization may look costly because of the additional cost explained above, but if the workload is huge, then it will be cost-beneficial.

Split the data processing jobs to run the data processing part with spot instances

Compute

~60% of compute cost

Spot instances can bring lots of savings, but splitting the jobs should not overflow the number of jobs.

Implement S3 and Cloud storage clean-up policy

Storage

~50% of storage cost

After certain period of time (based on the business requirement) add data archival rule is cost beneficial for longer run of the application.

Cost Optimization Techniques for Applications

Reference (Part 4 of this series): Cost optimization techniques for applications

Indicative figures and potential optimizations with impact and savings applying the techniques discussed related to application components optimization to the example workload are given below.

	Optimisation objective	Area To Optimise	Approx Cost Saving	TEChniques / Considerations
1	Implement EMR to do the data instead of using EC2 instances	Application	~50% of compute cost	EMR processes the data very fast with a higher configuration. So the uptime/running time of the cluster is lesser and hence, the cost is lesser than asg based compute infra. **Note: For the single customers, this optimization may look a bit costly because of the additional cost at the cluster level, but if the workload is high, then it will be cost-beneficial.
2	Review of Job slicing and Job scheduling	Application	~40% of compute cost	Categorize jobs as T-shirt sizes - big, medium, and small and allocate compute accordingly. Introduce a parallel process by splitting the incoming file into smaller chunks and adjusting the job size and schedule to leverage the use of spot instances with optimal duration and availability. Use of spot instances in place of on-demand instances to lower overall cost.

Optimisation objective

Area To Optimise

Approx Cost Saving

TEChniques / Considerations

Implement EMR to do the data instead of using EC2 instances

Application

~50% of compute cost

EMR processes the data very fast with a higher configuration. So the uptime/running time of the cluster is lesser and hence, the cost is lesser than asg based compute infra.

**Note: For the single customers, this optimization may look a bit costly because of the additional cost at the cluster level, but if the workload is high, then it will be cost-beneficial.

Review of Job slicing and Job scheduling

Application

~40% of compute cost

Categorize jobs as T-shirt sizes - big, medium, and small and allocate compute accordingly. Introduce a parallel process by splitting the incoming file into smaller chunks and adjusting the job size and schedule to leverage the use of spot instances with optimal duration and availability. Use of spot instances in place of on-demand instances to lower overall cost.

Cost Optimization Techniques for Data

Reference (Part 5 of this series): Cost optimization techniques for data

Indicative figures and potential optimizations with impact and savings applying the techniques discussed related to data components optimization to the example workload are given below.

#	Optimisation objective	Area To Optimise	Approx Cost Saving	TECHNIQUES / CONSIDERATIONS
1	Read from file storage instead of dynamodb	Data / Read API Calls	~30% of data cost	Make enrichment data available in files in S3 compared to DynamoDB. File storage does not need additional RCU(Read Capacity Unit) cost as compared to the Dynamodb. Hence, this is helpful to fetch multiple records from a data store at a time.
2	Use a single partition instead of two partitions to fetch the data	Data / Schema changes	~80% of data cost	Implement filters to limit the fetch of data across a single partition only.
3	Optimize BigQuery slot usage and requirement	Data/query Optimization	15% of data cost	Estimate the number of max slots needed for a given period. This can be estimated from existing functionality or other projects using similar functionality like GCP BigQuery. Based on past experience and the current requirement for the example cloud-native application, the goal should be to buy optimal slots for a longer period. Optimizing the queries, as well as reducing the columns, helps to reduce the number of slots consumed.

Conclusion

In this final part of the series, we shared a worked-out example to illustrate the cost optimization techniques for the functional components and the related impact. The objective of this series is to articulate the value of shifting left FinOps earlier in the application solution design phase itself. I look forward to sharing the details on how to apply this methodology to non-functional components to reduce the cost of security, observability, and availability.

Data processing Cloud optimization Cloud native computing

Opinions expressed by DZone contributors are their own.

Related

Trending