DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • A Data Warehouse Alone Won’t Cut It for Modern Analytics
  • How to Generate Customer Success Analytics in Snowflake
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025

Trending

  • How to Perform Custom Error Handling With ANTLR
  • Unlocking AI Coding Assistants: Generate Unit Tests
  • Agile’s Quarter-Century Crisis
  • Building Reliable LLM-Powered Microservices With Kubernetes on AWS
  1. DZone
  2. Data Engineering
  3. Data
  4. How to Setup Realtime Analytics over Logs with ELK Stack

How to Setup Realtime Analytics over Logs with ELK Stack

By 
Saurabh Chhajed user avatar
Saurabh Chhajed
·
Aug. 26, 14 · Tutorial
Likes (4)
Comment
Save
Tweet
Share
47.5K Views

Join the DZone community and get the full member experience.

Join For Free

Once we know something, we find it hard to imagine what it was like not to know it.

- Chip & Dan Heath, Authors of Made to Stick, Switch

Update: I have recently published a book on ELK stack titled - Learning ELK Stack , more details can be found here. 

What is the ELK stack ?

The ELK stack is ElasticSearch, Logstash and Kibana. These three provide a fully working real-time data analytics tool for getting wonderful information sitting on your data.

ElasticSearch

ElasticSearch,built on top of Apache Lucene, is a search engine with focus on real-time analysis of the data, and is based on the RESTful architecture. It provides standard full text search functionality and powerful search based on query. ElasticSearch is document-oriented/based and you can store everything you want as JSON. This makes it powerful, simple and flexible.

Logstash

Logstash is a tool for managing events and logs. You can use it to collect logs, parse them, and store them for later use.In ELK Stack logstash plays an important role in shipping the log and indexing them later which can be supplied to Elastic Search.

Kibana

Kibana is a user friendly way to view, search and visualize your log data, which will present the data stored from Logstash into ElasticSearch, in a very customizable interface with histogram and other panels which provides real-time analysis and search of data you have parsed into ElasticSearch.

How Do I Get It  ?

http://www.elasticsearch.org/overview/elkdownloads/

How Do They Work Together ?

Logstash is essentially a pipelining tool. In a basic, centralized installation a logstash agent, known as the shipper, will read input from one to many input sources and output that text wrapped in a JSON message to a broker. Typically Redis, the broker, caches the messages until another logstash agent, known as the collector, picks them up, and sends them to another output. In the common example this output is Elasticsearch, where the messages will be indexed and stored for searching. The Elasticsearch store is accessed via the Kibana web application which allows you to visualize and search through the logs. The entire system is scalable. Many different shippers may be running on many different hosts, watching log files and shipping the messages off to a cluster of brokers. Then many collectors can be reading those messages and writing them to an Elasticsearch cluster.

ELK

(E)lasticSearch (L)ogstash  (K)ibana (The ELK Stack)

How Do I Fetch Useful Information Out of Logs? 

Fetching useful information from logs is one of the most important part of this stack and is being done in logstash using its grok filters and a set of input , filter and output plugins which helps to scale this functionality for taking various kinds of inputs ( file,tcp, udp, gemfire, stdin, unix, web sockets and even IRC and twitter and many more) , filter them using (groks,grep,date filters etc.) and finally write ouput to ElasticSearch,redis,email,HTTP,MongoDB,Gemfire , Jira , Google Cloud Storage etc.

A Bit More About Log Stash

grok

Filters 

Transforming the logs as they go through the pipeline is possible as well using filters. Either on the shipper or collector, whichever suits your needs better. As an example, an Apache HTTP log entry can have each element (request, response code, response size, etc) parsed out into individual fields so they can be searched on more seamlessly. Information can be dropped if it isn’t important. Sensitive data can be masked. Messages can be tagged. The list goes on.

e.g.

input {
file {
path => ["var/log/apache.log"]
type => "saurzcode_apache_logs"
}
}
filter {
grok {
match => ["message","%{COMBINEDAPACHELOG}"]
}
}
output{
stdout{}
}

Above example takes input from an apache log file applies a grok filter with %{COMBINEDAPACHELOG}, which will index apache logs information on fields and finally output to Standard Output Console.

Writing Grok Filters

Writing grok filters and fetching information is the only task that requires some serious efforts and if done properly will give you great insights in to your data like Number of Transations performed over time, Which type of products have most hits etc.

Below links will help you a lot in writing grok filters and test them with ease -

Grok Debugger

http://grokdebug.herokuapp.com/

Grok Patterns Lookup

https://github.com/elasticsearch/logstash/tree/v1.4.2/patterns

References

  • http://www.elasticsearch.org/overview/
  • http://logstash.net/
  • http://rashidkpc.github.io/Kibana/about.html


Database Filter (software) Analytics Data (computing) Elasticsearch

Opinions expressed by DZone contributors are their own.

Related

  • A Data Warehouse Alone Won’t Cut It for Modern Analytics
  • How to Generate Customer Success Analytics in Snowflake
  • Why Database Migrations Take Months and How to Speed Them Up
  • Unmasking Entity-Based Data Masking: Best Practices 2025

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: