Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

The Challenges of Getting Log Data from Cloud Services

DZone's Guide to

The Challenges of Getting Log Data from Cloud Services

· Big Data Zone
Free Resource

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

It’s not surprising that the active use of cloud services can generate an immense amount of log data, or that analyzing that data can allow you to more effectively deploy and utilize your cloud services. But what might surprise you is the different ways the major cloud service providers generate log data and how they suggest you utilize it.Log data from cloud services

As an example, Google is probably the best known of the cloud service providers, with the Google Cloud Platform offering a range of services from the Google Drive storage platform to the complete Google Apps suite of business software. However, what they don’t offer is a standard data logging engine across the entire platform.

For example, if you are using the Google Compute Engine, it’s common methodology to also use the standard boot images provided; these would be either the Debian or CentOS 6.2 Linux servers, creating a 64-bixt, x86 virtual machine instance. For logging purposes, you would then treat that instance like any other Linux server deployment, using the standard log data. This, of course, would be true with any cloud service that allows you to create your own server or application instantiation, such as Amazon EC2.

But for the log data for specific Google provided services which are fully operated by Google, such as Google’s Chrome OS, Google Drive, or Google Apps, Google offers only a single way to look at that data: the Log Analyzer tool in the Google Apps Toolbox. Here you can only import data from individual instances running on client computers to analyze the data logs created by running the various services. The Google Apps administrator can also enable specific audit logs on some components of the service, such as the Admin console, the Marketplace app and Google Docs. Each log must be explicitly enabled, and log data is only collected when the logs are enabled.

Google draws a very distinct line between the users of what are primarily end-user targeted services, such as Google Apps, and the users of the developer-targeted services where applications are being built to run using their cloud service. There is a major presumption that developers will provide the hooks for data analysis and logging, rather than Google providing a user-focused interface that allows the implementation of the cloud services to be analyzed.

Microsoft Azure , on the other hand, is pretty much the complete opposite. Users can implement data logging with Windows Azure Diagnostics. To anyone familiar with Windows Server log data the process will be very familiar. In fact, all of the familiar log data from Windows Server can be gathered by Windows Azure Diagnostics. And like Windows Server, Windows Azure diagnostics can be configured remotely by third-party programs, giving the user a great deal of flexibility in how the data is analyzed.

With the standard Windows Azure tools, users are able to analyze usage patterns on Windows Azure storage, track the flow of Windows Azure applications, do diagnostics, and drill down using custom sets of Windows Azure performance counters. Fundamentally, Microsoft treats Windows Azure log analysis the same as Windows Server, with the primary uses being tracking application behavior and storage.

While Amazon EC2 is much like Google, Amazon Web Services primarily suggests that customers make use of the Amazon Elastic Map Reduce, the Amazon-branded Hadoop cluster, to do data analysis on the large amount of data contained in logs created by applications running on AWS. This involves pulling the data logs from your Apache web servers, for example, and then making use of the Amazon EMR Command Line Interface and the AWS Management Console, to create data that can then be used by a report generation tool to massage into a useful form. However the core tool for analysis of the log data from applications across AWS product line is still the Amazon Elastic MapReduce service.

If your basic takeaway from this is that none of the services make it easy to quickly analyze or utilize the huge amount of log data generated, well, you are correct. But don’t despair yet! In a follow-up article, we’ll take a look at a few examples of the exact steps necessary to get useful information using the provided tools for these three services.

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:

Published at DZone with permission of Trevor Parsons, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}