DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. DevOps, Benchmarking, and ML in the Product Development LifeCycle

DevOps, Benchmarking, and ML in the Product Development LifeCycle

In this article, explore why benchmarking is a key part of DevOps and how Machine Learning models can possibly help to predict the benchmarks.

Aravindan Varadan user avatar by
Aravindan Varadan
·
Nov. 06, 18 · Opinion
Like (1)
Save
Tweet
Share
4.66K Views

Join the DZone community and get the full member experience.

Join For Free

Every new software product release introduces new features and functionalities that directly affects the existing technical benchmarks. Benchmarking as part of DevOps helps to shape the product better in many aspects. For example, it helps to keep up the application performance even after the addition of new features.

Typical technical benchmarking includes transaction processing/sec (tps), server processor requirements, memory requirements, storage capacity etc. In this article, I have tried to explain why benchmarking is a key part of DevOps and how Machine Learning models can possibly help to predict the benchmarks.

Benchmarking Technical Stacks

Benchmarking as part of PDLC has got below advantages.

  • Maintain/tunetpsand performance after the addition of new features to the product
  • Prepare environments with appropriate storage, memory etc. accordingly
  • Set client expectations upfront to purchase hardware accordingly

Few practical examples below:

  • Baselined Database size requirement is 2 TB per year and it keeps growing for every new product version release
  • 16 GB heap is required to run an application in the application server
  • A new feature introduced in the product is CPU intensive that may have to be deployed in a dedicated server

Benchmarking as Part of a Typical DevOps Cycle

It is important to perform non-functional testing in addition to the regular functional testing. Benchmarking exercises are typically done as part of the testing phase in the DevOps cycle.

Image title

Load/stress testing and performance testing helps in benchmarking the servers.

How Machine Learning Helps

Machine learning is used when we need to predict something in the future from a large volume of data. In this article, I have tried to explain how database size can be predicted for a new client based on their requirements.

Several features that are given below affect the size of the database of a product. Sample data is given in bracket for each feature. This example is based on a fund accounting banking product.

  • Concurrent users count (1000)

  • Number of transactions processed every day (1.6 million)

  • Number of funds (35 thousand)

  • Number of positions (10 million)

  • Product features that are enabled/disabled for clients (yes/no flags)

  • Audit log storage options (yes/no flags)

DevOps, Benchmarking, and ML…all together

After repeated rounds of load/stress/performance testing, we should have input metrics with which we should be able to predict the application behavior based on a defined set of parameters and values. These baselined parameters and values could act as input to train the Machine Learning model. Furthermore, once the product is rolled out to (new) clients, start collecting the metrics from client application’s runtime environment (the client’s production environment reflects true application behavior). These samples are also fed to continuously train ML model.

A part of such sample data is shown below.

Image title

Since there are thousands and millions of transactions processed at every client site every day, the number of records is enormous.

The workflow below explains how benchmarking and ML are part of DevOps cycle. In this workflow, DB size prediction is shown as an example.

Image title

At the end of this process, ML is used to predict the size of the database for any new client that wants to know the DB size required based on their parameter values and requirements for a particular product release. Once the product is rolled out to this client, their data is collected too and the cycle repeats.

Which ML Model Can Be Used?

Well, it really depends. In our case, I have used linear regression model because the DB size grows as other parameter values increases.

Image title

In our case, we need to predict DB size based on multiple inputs and not just one.

  • Number of transactions

  • Number of positions

  • Number of funds

  • Number of users

The multi-linear regression model looks like below.

Image title

Significance of R-Squared

The R-squared value indicates % variance of line from the mean. The value close to 1 (100%) indicates a good relationship between the x and y-axis. What that means is y is highly impacted by any change in your x parameter values. This will ease the process to predict y based on the value of x.

Image titleImage title

When the line is flat or variance is less, that means x parameters have less influence on y. Relook at x parameters and eliminate the unwanted parameters to get better results.

Another usefulness of this parameter is to check how the product/application behavior changes for every newer version. Too much deviation in R-squared in every release indicates something needs to be done to the application design or code to fix the issues.

Code Snippet

Below code snippet shows how DB size for a new customer is predicted using linear regression model.

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Import the existing dataset. This is already baselined data taken from client’s running environments. 
dataset = pd.read_csv(samples_from_client_production.csv')
X = dataset.iloc[:, 0:7].values
y = dataset.iloc[:, 7].values

#X coordinate contains concurrent users, daily NAV, monthly transactions etc.  
print (X[:5])

#y coordinate contains existing DB size growth 
print (y[:5])

# Split 2/3 of dataset into training set and 1/3 for test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

# Fit multiple linear regression to the training set
from sklearn.linear_model import LinearRegression
lm = LinearRegression()

#train ML 
lm.fit(X_train, y_train)

#try some prediction within training/test data set. 
y_pred = lm.predict(X_test)

#Let us compare the predicted value above with the actual data from test set with the help of root mean square 
from sklearn.metrics import mean_squared_error
rmse = np.sqrt(mean_squared_error(y_test, y_pred)
r2 = lm.score(X_train, y_train)
print("Root mean square ::{} ".format(rmse)) # Close to 0 means good fit
print("Score ::{} ".format(r2)) # Near to 1 is best fit

#now its time to predict the DB size for the new client based on their X feature values.
datasetforprediction = pd.read_csv('test_data_from_new_client.csv') #assume this value is coming from client’s production environments
X_forprediction = datasetforprediction.iloc[:, 0:7].values

#now use this data as input on the previously trained ML and predict the DB size for the new client
y_predicted = lm.predict(X_forprediction)

Summary

To wrap up, continuous benchmarking is an integral part of DevOps. It is important to collect the samples from stable/production environments and continuously feed to the datasets and regularly train the machine learning algorithm. This will help in predicting the desired parameter accurately for the new clients. In this example, I have shown DB size to explain the process in a simple way but a typical benchmarking/capacity sizing would involve several other parameters such as transaction per second (tps), latency, hardware requirements etc.

Happy to hear any comments/feedback!

Machine learning DevOps

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Load Balancing Pattern
  • OpenID Connect Flows
  • Distributed SQL: An Alternative to Database Sharding
  • Understanding gRPC Concepts, Use Cases, and Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: