DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Your Go-to Guide to Develop Cryptocurrency Blockchain in Node.Js
  • How Java Apps Litter Beyond the Heap
  • The Importance of Persistent Storage in Kubernetes- OpenEBS
  • Develop a Scalable Event Listener for Blockchain

Trending

  • Traffic Management and Network Resiliency With Istio Service Mesh
  • How to Configure Istio, Prometheus and Grafana for Monitoring
  • Supercharging Productivity in Microservice Development With AI Tools
  • Adopting Agile Practices for Workforce Management: Benefits, Challenges, and Practices
  1. DZone
  2. Data Engineering
  3. Data
  4. Logstash - Sequence of Grok Blocks is Important

Logstash - Sequence of Grok Blocks is Important

Bipin Patwardhan user avatar by
Bipin Patwardhan
·
May. 14, 15 · Interview
Like (0)
Save
Tweet
Share
10.58K Views

Join the DZone community and get the full member experience.

Join For Free

Introduction

Logstash is one of the popular open source tools used to parse application logs. It is commonly packaged with two other tools, namely Elasticsearch and Kibana, to form the ELK stack.

Recently, I was parsing some data from an application log. A sample of the log is shown below.

2015-04-17 16:23:21,581 - DEBUG - [process16] sample.vo.RemoteApplicationEntityVO - Bean is null.
2015-04-17 16:23:29,296 - DEBUG - [processor4] sample.filter.AuthorizationFilter - User james authorized to access: printReport
2015-04-17 16:24:25,997 - DEBUG - [processor1] sample.entity.RemoteContext - Removing remote application entity.
2015-04-17 16:24:25,997 - DEBUG - [processor1] sample.entity.RemoteContext - sample.vo.RemoteApplicationEntityVO@xxys345
2015-04-17 16:24:25,997 - ERROR - [processor1] sample.entity.RemoteContext - Exception in setStorageFailureEnabled
java.lang.NullPointerException
	at sample.entity.RemoteContext.setStorageEnabled(RemoteContext.java:627)
	at sample.entity.RemoteContext.removeArchiveEntity(RemoteContext.java:496)
	at sample.entity.RemoteContext.removeEntity(RemoteContext.java:459)
	at sample.struts.PrintReport.execute(PrintReportAction.java:136)
2015-04-17 16:32:03,805 - ERROR - [processor5] sample.ArchiveContext - Exception validating remote archive admin
java.rmi.ConnectException: Connection refused to host: 127.0.0.1; nested exception is: 
	java.net.ConnectException: Connection refused
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:128)
	. . .
	at sample.struts.SearchAction.execute(SearchAction.java:246)
Caused by: java.net.ConnectException: Connection refused
	at sun.rmi.transport.tcp.TCPEndpoint.newSocket(TCPEndpoint.java:56)
	... 42 more
2015-04-19 13:57:45,046 - DEBUG - [processor5] sample.struts.LoginAction - User Logging in: jenny
2015-04-19 13:57:45,046 - DEBUG - [processor5] sample.struts.LoginAction - Creating a login request for: jenny
2015-04-19 13:57:45,760 - ERROR - [processor5] sample.struts.LoginAction - User credential invalid - User jenny password is incorrect

Parsing the data

To parse this data, I initially used the following pattern in the Logstash script. The grok statement that applies this pattern, also adds a field named ‘subType’, which can be used during analytics and report generation. I am naming this the ‘log’ pattern.

match => { "message" => "%{TOMCAT_DATESTAMP:logTimeString}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{SPACE}-%{SPACE}\[%{CUSTOM_PROCESS_NAME:processName}\]%{SPACE}%{JAVACLASS:class}%{SPACE}-%{SPACE}%{JAVALOGMESSAGE:logmessage}" }

For input data that matches this pattern, Logstash creates a JSON record as shown below.

{
  "message" => "2015-04-17 16:32:03,805 - ERROR - [processor5] sample.ArchiveContext - Exception validating remote archive admin ",
  "@version" => "1",
  "@timestamp" => "2015-05-05T10:06:37.075Z",
  "host" => "127.0.0.1:63036",
  "type" => "webadmin_log",
  "tags" => [],
  "logTimeString" => "2015-04-17 16:32:03,805",
  "level" => "ERROR",
  "processName" => "processor5",
  "class" => " sample.ArchiveContext ",
  "logmessage" => "Exception validating remote archive admin ",
  "subType" => "log",
  "logTime" => "2015-04-17T11:02:03.805Z"
}

On executing the script, I noted that this pattern matched each and every line from the input. While that is a good thing - needing only one pattern for parsing - a drawback is that it reduces our ability to differentiate between various lines of input data. For example, due to this pattern, each record is tagged as a ‘log’ record and there is no difference between a line of input that lists an exception and a line of input that lists incorrect login.

The aim of parsing is not only convert plain text into a more suitable format, but also to try and tag it, so that we can perform meaningful analysis and reporting at a later stage. Using Logstash, this is possible by using multiple patterns and to differentiate between the input using suitable markers in the input data. In the sample, the last input line provides the reason for user login failure. If this line can be tagged separately, a report that lists all incorrect login attempts in a day, week or month can easily be generated. If the line is tagged using the generic ‘log’ tag, additional business logic will be needed to go through the parsed data and identify those lines that are specific to login attempt failure. For easier recollection, I am replicating the last two lines from the sample data.

BUG - [processor5] sample.struts.LoginAction - Creating a login request for: jenny
2015-04-19 13:57:45,760 - ERROR - [processor5] sample.struts.LoginAction - User credential invalid - User jenny password is incorrect

Using multiple patterns

To parse the input, I decided to use another pattern for the last line of input. The pattern I defined is shown below. I am naming this the ‘credential’ pattern.

match => { "message" => "%{TOMCAT_DATESTAMP:logTimeString}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{SPACE}-%{SPACE}\[%{CUSTOM_PROCESS_NAME:processName}\]%{SPACE}%{JAVACLASS:class}%{SPACE}-%{SPACE}%{CUSTOM_TEXT_DATA:textData}%{SPACE}-%{SPACE}%{JAVALOGMESSAGE:logmessage}" }

The ‘subType’ field in this case, is assigned the value ‘credential’ in the corresponding grok block. Then I put this pattern after the ‘log’ pattern (with ‘subType’ value as ‘log’). On execution, I expected to see a record with ‘subType’ value as ‘credential’. To my surprise, all records were having ‘subType’ value as ‘log’. The reason for this behaviour was the sequence of grok blocks in the Logstash script.

As Logstash executes the statements in the ‘filter’ block in a top-down manner for each line of input, it was matching the input with the first pattern it encountered. This happened because the two patterns are very similar and the first pattern also successfully matches the last line from the sample. If the second pattern is not placed inside a ‘_grokparsefailure’ check, Logstash will happily match the last line from the sample with the second pattern as well. This will result in a record that contains duplicate values for many fields, namely logTimeString, logTime, class, processName and subType. Generating reports from such records is problematic. A sample record is shown below. Hence, each pattern is enclosed in a ‘_grokparsefailure’ condition check, which ensures that Logstash applies a pattern only if the previous pattern does not match the input data.
{
  "message" => "2015-04-19 13:57:45,760 - ERROR - [processor5] sample.struts.LoginAction - User credential invalid - User jenny password is incorrect",
  "@version" => "1",
  "@timestamp" => "2015-05-05T10:22:29.234Z",
  "host" => "127.0.0.1:62462",
  "type" => "webadmin_log",
  "logTimeString" => [ [0] "2015-04-19 13:57:45,760", [1] "2015-04-19 13:57:45,760"],
  "level" => [ [0] "ERROR", [1] "ERROR"],
  "processName" => [ [0] "processor5", [1] "processor5"    ],
  "class" => [ [0] "sample.struts.LoginAction", [1] "sample.struts.LoginAction"],
  "textData" => "User credential invalid ",
  "logmessage" => [ [0] "User jenny password is incorrect", [1] "User credential invalid - User jenny password is incorrect"],
  "subType" => [ [0] "credential", [1] "log"],
  "logTime" => "2015-04-19T08:27:45.760Z"
}

As the ‘log’ pattern was more generic, it was matched even those input lines that I wanted to be matched using the ‘credential’ pattern. Hence I had to change the pattern matching sequence. I had to place the ‘credential’ pattern check before the ‘log’ pattern check. After this change, the Logstash output was as per expectation. The final Logstash script is given below.

input {
  tcp {
    type => "webadmin_log"
    port => 9999
  }
}
filter {
  if [type] == "webadmin_log" {
    # user credential invalid
    grok {
      patterns_dir => "./patterns"
      match => { "message" => "%{TOMCAT_DATESTAMP:logTimeString}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{SPACE}-%{SPACE}\[%{CUSTOM_PROCESS_NAME:processName}\]%{SPACE}%{JAVACLASS:class}%{SPACE}-%{SPACE}%{CUSTOM_TEXT_DATA:textData}%{SPACE}-%{SPACE}%{JAVALOGMESSAGE:logmessage}" }
      add_field => { "subType" => "credential" }
    }
    # other log lines
    if "_grokparsefailure" in [tags] {
      grok {
        patterns_dir => "./patterns"
        match => { "message" => "%{TOMCAT_DATESTAMP:logTimeString}%{SPACE}-%{SPACE}%{LOGLEVEL:level}%{SPACE}-%{SPACE}\[%{CUSTOM_PROCESS_NAME:processName}\]%{SPACE}%{JAVACLASS:class}%{SPACE}-%{SPACE}%{JAVALOGMESSAGE:logmessage}" }
        add_field => { "subType" => "log" }
        remove_tag => ["_grokparsefailure"]
      }
  }
		
  if "_grokparsefailure" not in [tags] {
    date {
      locale => "en"
      match => ["logTimeString", "YYYY-MM-dd HH:mm:ss,SSS"]
      #timezone => "Asia/Calcutta"
      target => "logTime"
    }
  }
}
output {
  if "_grokparsefailure" not in [tags] {
      stdout {
      codec => "rubydebug"
    }
  }
}

Conclusion

While tools like Logstash make it simpler and easier for us to parse application logs and convert plain text into structured documents, parsing the input to create the correct document structure is of prime importance. If the documents are not populated with the correct values, we will face challenges during the analytics and reporting stages. For finer control during the analytics and reporting stages, we may wish to specify Logstash grok patterns that allow us to classify the input data. While doing so, we need to ensure that a line of input matches one and only one pattern in the script. Additionally, we need to arrange the grok pattern sequence in such a way that more specific patterns are tried before the more generic patterns are applied on the input data.

In summary, while parsing data, not only is the pattern important, the sequence of patterns applied to the input data is also equally important.

Grok (web framework) Blocks Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Your Go-to Guide to Develop Cryptocurrency Blockchain in Node.Js
  • How Java Apps Litter Beyond the Heap
  • The Importance of Persistent Storage in Kubernetes- OpenEBS
  • Develop a Scalable Event Listener for Blockchain

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: