Logstash, ElasticSearch and Kibana Integration for Clickstream Weblog Ingestion
Join the DZone community and get the full member experience.
Join For FreeIn this blog I am going to showcase how we can develop a quick and easy demo application for clickstream weblog ingestion, search and visualization. We will achieve this using Logstash for log ingestion, store it in ElasticSearch and make a pretty dashboard using Kibana. For clickstream weblog I am using logs data from ECML/PKDD 2005 Discovery Challenge .
You can download complete weblogs after registering there. These weblog are delimited by semi-colon (;) and have below mentioned fields in order:
- shop_id
- unixtime
- client ip
- session
- visted page
- referrer
Here are some sample log lines:
15;1075658406;212.96.166.162;052ecba084545d8348806f087b6e09bb;/ls/?&id=77&view=2,6,31&pozice=20;http://www.shop5.cz/ls/?id=77 12;1075658406;195.146.109.248;05aa4f4db0162e5723331042eb9ce8a7;/ct/?c=153;http://www.shop3.cz/ 12;1075658407;212.65.194.144;86140090a2e102f1644f29e5ddadad9b;/ls/?id=34;http://www.shop3.cz/ct/?c=155 14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0 17;1075658408;194.108.232.234;be0970125c4eb3ee4fc380be05b3c58f;/ls/?id=155&sort=45;http://www.shop7.cz/ls/?id=155&sort=45 12;1075658409;62.24.70.41;851f20e644eb8bf82bfdbe4379050e2e;/txt/?c=734;http://www.shop3.cz/onakupu/
For creating this demo we need to create a logstash configuration file (lets name this file clickstream.conf) which consists of specifying inputs, filters and outputs. The clickstream.conf file looks like:
input { file {# path for clickstream log path =>"/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log"# define a type for all events handeled by this input type =>"weblog" start_position =>"beginning"# the clickstream log is in character set ISO-8859-1 codec => plain {charset =>"ISO-8859-1"} } } filter { csv {# define columns present in weblog columns =>[shop_id, unixtime, client_ip, session, page, referrer] separator =>";" } grok {# get visited page and page parameters match =>["page","%{URIPATH:page_visited}(?:%{URIPARAM:page_params})?"] remove_field =>["page"] } date {# as we are getting unixtime field in epoch seconds we will convert it to normal timestamp match =>["unixtime","UNIX"] } geoip {# this will convert ip to longitude-latitude using GeoLiteCity database from Maxmind source =>"client_ip" fields =>["latitude","longitude"] target =>"geoip" add_field =>["[geoip][coordinates]","%{[geoip][longitude]}"] add_field =>["[geoip][coordinates]","%{[geoip][latitude]}"] } mutate {# this will convert geoip.coordinates to float values convert =>["[geoip][coordinates]","float"]} } output {# store output in local elasticsearch cluster elasticsearch { host =>"127.0.0.1" } }To start logstash agent we run below command:
java -jar logstash-1.2.2-flatjar.jar agent -f clickstream.conf
Now the log will be indexed to ElasticSearch. A sample record in ElasticSearch looks like this:
{ _index: logstash-2004.02.01 _type: logs _id: I1N0MboUR0O1O3RZ-qXqnw _version:1 _score:1 _source:{ message:[14;1075658407;80.188.85.210;f07f39ec63abf67f965684f3fa5729c4;/findp/?&id=63&view=1,2,3,14,20,15&p_14=nerez;http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0 ]@timestamp:2004-02-01T18:00:07.000Z@version:1 type: weblog host: HMECL000315.happiestminds.com path:/home/rishav.rohit/Desktop/clickstream/_2004_02_01_19_click_stream.log shop_id:14 unixtime:1075658407 client_ip:80.188.85.210 session: f07f39ec63abf67f965684f3fa5729c4 referrer: http://www.shop4.cz/ls/?&p_14=nerez&id=63&view=1%2C2%2C3%2C14%2C20%2C15&&aktul=0 page_visited:/findp/ page_params:?&id=63&view=1,2,3,14,20,15&p_14=nerez geoip:{ latitude:50.08330000000001 longitude:14.466700000000003 coordinates:[14.46670000000000350.08330000000001] } } }
So we have parsed complex log message into simpler components and converted fields like unixtime to datetime, IP to latitude-longitude and got page visited by the client. Now using Kibana we can quickly make dashboard with these panel
This histogram shows page landings count for different time interval.
This is a map pointing to client locations
And in this table we can see different attributes for each clickstream.
Opinions expressed by DZone contributors are their own.
Trending
-
Top 10 Pillars of Zero Trust Networks
-
Merge GraphQL Schemas Using Apollo Server and Koa
-
Exploratory Testing Tutorial: A Comprehensive Guide With Examples and Best Practices
-
Unlocking the Power of AIOps: Enhancing DevOps With Intelligent Automation for Optimized IT Operations
Comments