DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Building Scalable Real-Time Apps with AstraDB and Vaadin
Register Now

Trending

  • Top 10 Pillars of Zero Trust Networks
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations

Trending

  • Top 10 Pillars of Zero Trust Networks
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion

Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion

Rishav Rohit user avatar by
Rishav Rohit
·
Jan. 25, 14 · Interview
Like (1)
Save
Tweet
Share
18.15K Views

Join the DZone community and get the full member experience.

Join For Free

In this blog I am going to showcase how we can develop a quick and easy demo application for clickstream weblog ingestion, search and visualization. We will achieve this using Logstash for log ingestion, store it in ElasticSearch and make a pretty dashboard using Kibana. For clickstream weblog I am using logs data from ECML/PKDD 2005 Discovery Challenge .

You can download complete weblogs after registering there. These weblog are delimited by semi-colon (;) and have below mentioned fields in order:

  • shop_id
  • unixtime
  • client ip
  • session
  • visted page
  • referrer

Here are some sample log lines:

15;1075658406;212.96.166.162;052ecba084545d8348806f087b6e09bb;/ls/?&id=77&view=2,6,31&pozice=20;http://www.shop5.cz/ls/?id=77
12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/
12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155
14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0
17;1075658408;194.108.232.234;be0970125c4eb3ee4fc380be05b3c58f;/ls/?id=155&sort=45;http://www.shop7.cz/ls/?id=155&sort=45
12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/

For creating this demo we need to create a logstash configuration file (lets name this file clickstream.conf) which consists of specifying inputs, filters and outputs. The clickstream.conf file looks like:

input { 
  file {# path for clickstream log
    path =>"/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log"# define a type for all events handeled by this input
    type =>"weblog"
    start_position =>"beginning"# the clickstream log is in character set ISO-8859-1
    codec => plain {charset =>"ISO-8859-1"}
  }
}

filter {
  csv {# define columns present in weblog
    columns =>[shop_id, unixtime, client_ip, session, page, referrer]
    separator =>";"
  }
  grok {# get visited page and page parameters
    match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"]
     remove_field =>["page"]
  }
  date {# as we are getting unixtime field in epoch seconds we will convert it to normal timestamp
    match =>["unixtime","UNIX"]
  }
  geoip {# this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind
    source =>"client_ip"
    fields =>["latitude","longitude"]
    target =>"geoip"
    add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"]
    add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"]
  }
  mutate {# this will convert geoip.coordinates to float values
    convert =>["[geoip][coordinates]","float"]}
  }

output {# store output in local elasticsearch cluster
  elasticsearch {
    host =>"127.0.0.1"
  }
}
To start logstash agent we run below command:

java -jar logstash-1.2.2-flatjar.jar agent -f clickstream.conf

Now the log will be indexed to ElasticSearch. A sample record in ElasticSearch looks like this:

{

    _index: logstash-2004.02.01
    _type: logs
    _id: I1N0MboUR0O1O3RZ-qXqnw
    _version:1
    _score:1
    _source:{
        message:[14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0 ]@timestamp:2004-02-01T18:00:07.000Z@version:1
        type: weblog
        host: HMECL000315.happiestminds.com
        path:/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log
        shop_id:14
        unixtime:1075658407
        client_ip:80.188.85.210
        session: f07f39ec63abf67f965684f3fa5729c4
        referrer: http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0
        page_visited:/findp/
        page_params:?&id=63&view=1,2,3,14,20,15&p_14=nerez
        geoip:{
            latitude:50.08330000000001
            longitude:14.466700000000003
            coordinates:[14.46670000000000350.08330000000001]
        }
     }
}

So we have parsed complex log message into simpler components and converted fields like unixtime to datetime, IP to latitude-longitude and got page visited by the client. Now using Kibana we can quickly make dashboard with these panel


This histogram shows page landings count for different time interval.


This is a map pointing to client locations


And in this table we can see different attributes for each clickstream.

Elasticsearch Kibana Integration

Opinions expressed by DZone contributors are their own.

Trending

  • Top 10 Pillars of Zero Trust Networks
  • Merge GraphQL Schemas Using Apollo Server and Koa
  • Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
  • Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: