DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Performance Optimization for Multi-Layered Cloud Native AWS Application
  • Microsoft Azure Service Fabric
  • Keep Your Application Secrets Secret
  • Android Cloud Apps with Azure

Trending

  • Simplifying Multi-LLM Integration With KubeMQ
  • Understanding the Shift: Why Companies Are Migrating From MongoDB to Aerospike Database?
  • Supervised Fine-Tuning (SFT) on VLMs: From Pre-trained Checkpoints To Tuned Models
  • Immutable Secrets Management: A Zero-Trust Approach to Sensitive Data in Containers
  1. DZone
  2. Software Design and Architecture
  3. Cloud Architecture
  4. OpenShift Container Platform 3.11 Cost Optimization on Public Cloud Platforms

OpenShift Container Platform 3.11 Cost Optimization on Public Cloud Platforms

A developer gives a tutorial on optimizing the performance of OpenShift containers using some shell scripts. Read on to learn more!

By 
Ganesh Bhat user avatar
Ganesh Bhat
·
Dec. 31, 20 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
11.8K Views

Join the DZone community and get the full member experience.

Join For Free

I was part of cloud migration projects for one of our clients (a UK utility company), where we installed OpenShift clusters across different environments on top of the AWS cloud to deploy the code and run the DevOps pipelines.

We had five different OpenShift clusters viz; Development, Test, Pre-production, Production, and Automation. Each cluster would have different numbers of nodes, of which the master and infra nodes were constant across all environments (three nodes each) and a different number of worker nodes based on the workload on each cluster. So, the AWS cost was significantly high considering the number of nodes (AWS ec2-instances) we were working with across different clusters.

We produced a solution to shut down the cluster gracefully during non-business hours without impacting any development activity. This graceful shutdown included all application pods which were running on the worker nodes and all OpenShift system-related pods which were running on infra nodes.

But there was a challenge!

When we used OpenShift drain commands to drain the nodes, some pods were not fully deleted because of the session timeouts or because some pods were taking too long to delete. Though we were excluding OpenShift system-related Daemonset pods while draining, deleting all application pods within the grace period without the session timing out was a challenge. We had to implement a looping logic in the shell script to overcome this issue.

The implementation involved a series of shell scripts which were called one after the other in the below order.

Shutdown: 

  • Un-schedule Nodes
  • Application pod scale down
  • Drain Nodes
  • Stop ec2-instance.

Startup: 

  • Start ec2-instances
  • Schedule-back nodes
  • Application scale UP
  • Sanity check

This implementation helped our client to save enormously on their AWS infrastructure cost. We will take a look at the implementation in the subsequent sections below.

In the above sequence, the application pod scale down (application scale down) is optional as it is purely project specific.

AWS Monthly Cost

The above table depicts the difference between the initial implementation (July) and the implementation after our tweaks to the shell code (August).

Disclaimer:

Our implementation is for RedHat OpenShift v3.11, and the same implementation works for the original version of OpenShift (OKD), also. If anyone would like to implement this in a client's project, it is always recommended to seek the vendor's suggestion on how to/whether it is allowed to restart the nodes. Some vendors will not support the restart of underlying nodes once they are set up. Also, it is not recommended to shut down master nodes as they host many system-related pods and API components.

Below is the snippet of logic we used in the entire implementation.

Environment Shutdown

Step 1: Unschedule Nodes

oc adm manage-node <node-ip> --schedulable=false 

(Repeat the step for all nodes in the cluster)

Step 2: Drain all Nodes on the Cluster

The below logic will make sure that the session doesn’t timeout and all pods are deleted from the node.

Shell
 




xxxxxxxxxx
1
73


 
1
TIMEOUT=180
2

          
3
#Make sure that CHECKLOOPDELAY is a multiple TIMEOUT
4

          
5
CHECK_LOOP_DELAY=10
6

          
7
#Get the number of loops...
8

          
9
NUMBER_OF_LOOPS=$((TIMEOUT / CHECK_LOOP_DELAY))
10

          
11
##################################################################
12

          
13
echo "Starting to evict pods on Worker Node-1 with oc-adm command and timeout of ${TIMEOUT} seconds..."
14

          
15
(
16

          
17
    echo "Executing oc-adm....on <node-ip>"
18

          
19
      oc adm drain <node-ip> --ignore-daemonsets --force --grace-period=30 --delete-local-data)&
20

          
21
# Capture the SUBSHELL PID....
22

          
23
SUBPID=$!
24

          
25
echo "oc-adm command sub-shell PID: ${SUBPID}"
26

          
27
(
28

          
29
   # We will wait for TIMEOUT using CHECK_LOOP_DELAY and NUMBER_OF_LOOPS combination
30

          
31
    x=0
32

          
33
    while [ $x -le ${NUMBER_OF_LOOPS}]
34

          
35
    do
36

          
37
        sleep ${CHECK_LOOP_DELAY}
38

          
39
        If !ps -p ${SUBPID} > /dev/null
40

          
41
        then
42

          
43
            echo "oc-adm completed...."
44

          
45
            break
46

          
47
        else
48

          
49
            echo -n "."
50

          
51
        fi
52

          
53
        x=$(($x + 1 ))
54

          
55
    done
56

          
57
    if ps -p ${SUBPID} > /dev/null
58

          
59
    then
60

          
61
        echo "Going to kill the subshell: ${SUBPID}"
62

          
63
        kill -9 ${SUBPID}
64

          
65
        echo "Killed sub-shell"
66

          
67
    fi
68

          
69
)&
70

          
71
# wait for the two subshells either to complete successfully or get killed
72

          
73
Wait


Step 3: Shutdown Nodes

Once the pods on the nodes are completely drained, it is time to shut down the ec2-instances.

Environment Startup

Startup: 

  • Start ec2-instances
  • Schedule-back nodes
  • Application scale UP
  • Sanity check

Step 1:Depending on the cloud provider, you'll need to include the scripts in the beginning to start the environment.

Step 2: Schedule the nodes which were unscheduled before.

oc adm manage-node <nodeIP> --schedulable=true 

Step 3: Once the nodes are ready and the scheduling is enabled, the pods will automatically schedule on these nodes.

If the implementation has an application scale-UP script, that needs to be scaled up additionally once the cluster nodes are up and running.

OpenShift Cloud application Implementation Container optimization

Opinions expressed by DZone contributors are their own.

Related

  • Performance Optimization for Multi-Layered Cloud Native AWS Application
  • Microsoft Azure Service Fabric
  • Keep Your Application Secrets Secret
  • Android Cloud Apps with Azure

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!