Analyzing Application Workload Data with Apprenda and R
Join the DZone community and get the full member experience.
Join For FreeOne of the most important kinds of data a Platform as a Service (PaaS) can leverage is its knowledge of guest applications that run within its purview. A PaaS should know all sorts of things about guest applications – their architecture, dependencies, scale across infrastructure, and more.
Data including application resource utilization metrics (CPU, RAM, etc.) are key for things like data center capacity planning, policy enforcement, and application isolation in the enterprise. A PaaS such as Apprenda provides this information through a centralized single lens – in our case, a collection of RESTful APIs – making it easier than ever before to run analytics on application metrics in the data center.
Apprenda’s approach as a PaaS is to provide developers and platform operators with helpful information through platform extensibility and APIs. This is because there are plenty of tools in the data center that provide advanced analytic capabilities, so long as you can feed them the information they need. We integrate with tools like System Center, New Relic, and more all the time because these are the tools our customers have invested in, and they are great at what they do.
Our job is not to reinvent these tools but instead to provide data. Apprenda captures information about applications such as their duration of deployment, resource policy (allocation of CPU and memory), actual utilization of resources, scale (# of instances), custom metadata, and more. All of this information can be fed into data center tools that help IT make important, data-driven, decisions.
In the land of DevOps, however, it is not uncommon for folks to use this data in creative and innovative ways. Often times this means using the mechanism “du jour,” which can be scripting (PowerShell), a programming language (R), or an entire runtime (Node.js) to quickly and effectively grab, process, and manipulate data.
In a big example, let’s look at R, which is a powerful programming language centered on data mining and statistical analysis. It provides straightforward facilities for many types of data-analytics techniques, and is extensible using community maintained packages.
In the simple example below, I use standard R functions plus three packages (easily included using R’s install.packages() function):
1. jsonlite for parsing JSON data that the Apprenda API returns.
2. httr for handling the HTTP requests necessary to authenticate and retrieve data.
3. plotrix for help rendering a plot of retrieved data.
From there it’s pretty straightforward. The first step is to authenticate with your Apprenda environment:
I’ve now stored my Apprenda session token in a variable called ‘token.’ I’ll include that token as a header in my API call to get application data:
GET() is a function provided by the httr package that simplifies an HTTP request to the API. I’ve added the Apprenda session token to the HTTP Headers for authentication, and included a query string parameter that will help return all currently running application workloads on the platform. The data that is returned is parsed and stored in the variable (in R, a vector) called ‘r’ which now has 151 records, one for each application workload. Each record in ‘r’ has 15 variables (properties) that we can use to run analytics across the entire collection of results.
For the purposes of illustration, I’m going to use the variable componentType, which represents Apprenda’s knowledge of the type of application workload that was deployed – there are seven self-explanatory types: UserInterface, PublicUserInterface,WindowsService, JavaWebApplication, LinuxService, WcfService, and Database. When the collection is then grouped bycomponentType, it becomes pretty simple to plot a chart showing the distribution of workload by the type of component:
The resulting plot (pie3D() comes from the plotrix package) looks like this:
I’ve had conversations with IT folks who couldn’t describe the architectural makeup of their application portfolio in any level of detail, yet in this case we pulled the data in real time with one line of R. Admittedly, a pie chart is a pretty watered down way to look at this information, but the point is the data is available and can be grouped, filtered, manipulated, and analyzed very simply with R.
For this example, I used the open-source edition of RStudio. Some other powerful information that could be gleaned from the platform’s APIs:
1. The average discrepancy between resource allocation and actual utilization per workload. (This is helpful in capacity planning.)
2. The longest -running application workload.
3. The most distributed applications. (This could aid in scaling decisions.)
There are many more. A PaaS such as Apprenda is, by nature, in a unique spot in the data center stack because it maintains knowledge of both infrastructure and applications. It also serves as a hub for data that, when analyzed creatively, provides new insights. These insights are an opportunity for enterprises to enhance their practices to better serve developers and applications while operating more efficiently than ever.
Published at DZone with permission of Matthew Ammerman. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments