Over a million developers have joined DZone.

Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion

DZone's Guide to

Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion

· Big Data Zone
Free Resource

Need to build an application around your data? Learn more about dataflow programming for rapid development and greater creativity. 

In this blog I am going to showcase how we can develop a quick and easy demo application for clickstream weblog ingestion, search and visualization. We will achieve this using Logstash for log ingestion, store it in ElasticSearch and make a pretty dashboard using Kibana. For clickstream weblog I am using logs data from ECML/PKDD 2005 Discovery Challenge .

You can download complete weblogs after registering there. These weblog are delimited by semi-colon (;) and have below mentioned fields in order:

  • shop_id
  • unixtime
  • client ip
  • session
  • visted page
  • referrer

Here are some sample log lines:


For creating this demo we need to create a logstash configuration file (lets name this file clickstream.conf) which consists of specifying inputs, filters and outputs. The clickstream.conf file looks like:

input { 
  file {# path for clickstream log
    path =>"/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log"# define a type for all events handeled by this input
    type =>"weblog"
    start_position =>"beginning"# the clickstream log is in character set ISO-8859-1
    codec => plain {charset =>"ISO-8859-1"}

filter {
  csv {# define columns present in weblog
    columns =>[shop_id, unixtime, client_ip, session, page, referrer]
    separator =>";"
  grok {# get visited page and page parameters
    match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"]
     remove_field =>["page"]
  date {# as we are getting unixtime field in epoch seconds we will convert it to normal timestamp
    match =>["unixtime","UNIX"]
  geoip {# this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind
    source =>"client_ip"
    fields =>["latitude","longitude"]
    target =>"geoip"
    add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"]
    add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"]
  mutate {# this will convert geoip.coordinates to float values
    convert =>["[geoip][coordinates]","float"]}

output {# store output in local elasticsearch cluster
  elasticsearch {
    host =>""
To start logstash agent we run below command:

java -jar logstash-1.2.2-flatjar.jar agent -f clickstream.conf

Now the log will be indexed to ElasticSearch. A sample record in ElasticSearch looks like this:


    _index: logstash-2004.02.01
    _type: logs
    _id: I1N0MboUR0O1O3RZ-qXqnw
        message:[14;1075658407;;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0 ]@timestamp:2004-02-01T18:00:07.000Z@version:1
        type: weblog
        host: HMECL000315.happiestminds.com
        session: f07f39ec63abf67f965684f3fa5729c4
        referrer: http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0

So we have parsed complex log message into simpler components and converted fields like unixtime to datetime, IP to latitude-longitude and got page visited by the client. Now using Kibana we can quickly make dashboard with these panel

This histogram shows page landings count for different time interval.

This is a map pointing to client locations

And in this table we can see different attributes for each clickstream.

Check out the Exaptive data application Studio. Technology agnostic. No glue code. Use what you know and rely on the community for what you don't. Try the community version.


Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}