Over a million developers have joined DZone.

How to Setup Realtime Analytics over Logs with ELK Stack

· Big Data Zone

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

Once we know something, we find it hard to imagine what it was like not to know it.

- Chip & Dan Heath, Authors of Made to Stick, Switch

Update: I have recently published a book on ELK stack titled - Learning ELK Stack , more details can be found here

What is the ELK stack ?

The ELK stack is ElasticSearchLogstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.

ElasticSearch

ElasticSearch,built on top of Apache Lucene, is a search engine with focus on real-time analysis of the data, and is based on the RESTful architecture. It provides standard full text search functionality and powerful search based on query. ElasticSearch is document-oriented/based and you can store everything you want as JSON. This makes it powerful, simple and flexible.

Logstash

Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.In ELK Stack logstash plays an important role in shipping the log and indexing them later which can be supplied to Elastic Search.

Kibana

Kibana is a user friendly way to view, search and visualize your log data, which will present the data stored from Logstash into ElasticSearch, in a very customizable interface with histogram and other panels which provides real-time analysis and search of data you have parsed into ElasticSearch.

How Do I Get It  ?

http://www.elasticsearch.org/overview/elkdownloads/

How Do They Work Together ?

Logstash is essentially a pipelining tool. In a basic, centralized installation a logstash agent, known as the shipper, will read input from one to many input sources and output that text wrapped in a JSON message to a broker. Typically Redis, the broker, caches the messages until another logstash agent, known as the collector, picks them up, and sends them to another output. In the common example this output is Elasticsearch, where the messages will be indexed and stored for searching. The Elasticsearch store is accessed via the Kibana web application which allows you to visualize and search through the logs. The entire system is scalable. Many different shippers may be running on many different hosts, watching log files and shipping the messages off to a cluster of brokers. Then many collectors can be reading those messages and writing them to an Elasticsearch cluster.

ELK

(E)lasticSearch (L)ogstash  (K)ibana (The ELK Stack)

How Do I Fetch Useful Information Out of Logs? 

Fetching useful information from logs is one of the most important part of this stack and is being done in logstash using its grok filters and a set of input , filter and output plugins which helps to scale this functionality for taking various kinds of inputs ( file,tcp, udp, gemfire, stdin, unix, web sockets and even IRC and twitter and many more) , filter them using (groks,grep,date filters etc.) and finally write ouput to ElasticSearch,redis,email,HTTP,MongoDB,Gemfire , Jira , Google Cloud Storage etc.

A Bit More About Log Stash

grok

Filters 

Transforming the logs as they go through the pipeline is possible as well using filters. Either on the shipper or collector, whichever suits your needs better. As an example, an Apache HTTP log entry can have each element (request, response code, response size, etc) parsed out into individual fields so they can be searched on more seamlessly. Information can be dropped if it isn’t important. Sensitive data can be masked. Messages can be tagged. The list goes on.

e.g.

input {
file {
path => ["var/log/apache.log"]
type => "saurzcode_apache_logs"
}
}
filter {
grok {
match => ["message","%{COMBINEDAPACHELOG}"]
}
}
output{
stdout{}
}

Above example takes input from an apache log file applies a grok filter with %{COMBINEDAPACHELOG}, which will index apache logs information on fields and finally output to Standard Output Console.

Writing Grok Filters

Writing grok filters and fetching information is the only task that requires some serious efforts and if done properly will give you great insights in to your data like Number of Transations performed over time, Which type of products have most hits etc.

Below links will help you a lot in writing grok filters and test them with ease -

Grok Debugger

http://grokdebug.herokuapp.com/

Grok Patterns Lookup

https://github.com/elasticsearch/logstash/tree/v1.4.2/patterns

References


Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

Topics:
bigdata ,analytics ,big data ,logstash ,elastic search ,kibana ,elk stack

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}