Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Starting With Regular Expressions in Splunk

DZone's Guide to

Starting With Regular Expressions in Splunk

In this post, you will to learn how to write RegEX for Splunk, one of the most widely used platforms for data monitoring and analysis.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Splunk is one of the most widely used platforms for data monitoring and analysis, it provides various index and search patterns to get your desired data and arrange it in a tabular format by using the stats, chart, or table inbuilt features of Splunk. They are quite easy to use when you have the raw event data aligned in a proper format and the required data values are tagged to a defined field in Splunk.

However, scenarios change when the data in the raw event is not in a specific format, some part of the data is tagged to a field, and some part of the data is a plain XML or JSON payload being clubbed in with other data like time details or unique id values (like below). So, creating a new field each time for a specific set of data out of the XML tag or JSON key-value pair becomes a hectic task and one might not know how it would affect some other raw events during the search which are not identical to the event out of which the field was extracted, hence it might not fetch the desired result of every search.

XML along with:

source=proboo Time=2018-02-20T12:00:30 transactionID=254g Payload=<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName></employee>
<employee><firstName>Anna</firstName> <lastName>Smith</lastName>
</employee><employee><firstName>Peter</firstName> <lastName>Jones</lastName></employee></employees>

JSON along with data:

source=proboo Time=2018-02-20T12:00:30 transactionID=254g Payload={"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}

So, in the case of the above scenarios, where the raw event is a mixture of some data tagged field clubbed with an XML/JSON payload, then regular expressions can be written with the search string to easily fetch the desired values out of the XML or JSON, along with the other data, and get them aligned in a tabular format.

But, learning to write a regular expression is not that easy, so where should you start? How are you to know if you are on the right path or not? Testing the data after writing a single set of patterns, so that nothing is missed, is not that easy. I found it quite difficult and had to google a lot when I came across a way to proceed with learning to write a basic regular expression, which I can use in Splunk to get my desired data out of the above scenarios. So, I am writing this article so that I can share my experience on how I started with it.

So, before learning regular expressions, one needs to simply understand how to write a simplified expression which is identical to their set of data, as everything in a regular expression is a character, and we need to write a pattern which would match the exact sequence of characters as it is in our data. In our day-to-day use, data upon which we need to act contains letters, numbers, some special characters (%#$@!), or some basic tag types. We need to use this only to form a pattern on the whole dataset, which in turns will result in our regular expression and can be used in Splunk along with the search string.

I am sharing the below link, which I found while searching to learn regular expressions. This website will provide you with insights into the various pattern types used to write regular expressions along with some hands-on experience as well.

https://regexone.com

Go through the first 15 chapters clearly, try to match the whole pattern specified in the hands-on part, try to explore different patterns and check out the solution part as well. First, 15 will give you a basic understanding of regular expression. Now, after exploring the above link, copy one of the above examples and go to the below website, this is one of the best Regex engines I came across for testing regular expressions to be used in Splunk.

Note: Please change the RegEx engine to PCRE server as it allows grouping features,

https://regexr.com/

Image title

Below is an image showing how this engine looks.

It is divided into three parts:

  1. The expression writing part.

  2. The data part where you need to put the data to be acted upon.

  3. Tools or line character wise explanation of your expression (this will help you to learn more about working with the regular expression).

Image title

You don't need to start writing the expression, starting from the first character in the data set (until and unless you have not specified your search keywords in the search part of the Splunk query clearly) if you already have the data specified in a Splunk field.

Try out the below expressions and see the result, try to explore more on it. Start by writing one character from the below expression at a time and see the part of the dataset which gets highlighted as a result of the query string that you wrote down. The below pattern is all you went through the above Regular expression learning website.

Payload=([\s\S\w\W])

Payload=([\s\S\w\W]+)

Now we will learn how to get the first name and how to start with grouping in regular expressions.

So, first, we need to know what grouping is in terms of regular expressions. Grouping helps us to extract exact information out of a bigger match context in the data set. Now, in Splunk, we can easily use this group of names to extract the data we need by feeding them to stats, charts, etc.

The syntax for groping is:

(?P<name>Value)

This name can be anything of your choice, try to make it identical to the value you are trying to fetch.

Below is a part of the RegEx string used for extracting the first name and the last name, out of the above XML and JSOn payload.

For XML type:

Payload=([\s\S\w\W]+)<employee><firstName>(?P<fname>\S+)
<\/firstName>\s<lastName>(?P<lname>\S+)<\/lastName><\/employee>

Breakdown of the above pattern:

([\s\S\w\W]+) - selects the whole dataset after "Payload="

>(?P<fname>\S+)<\ - This pattern extracts all the values between the fistname starting and ending tag and stored it in fname.

\S + or (.*)- selects the value.

or

Payload=([\s\S\w\W]+)<employee><firstName>(?P<fname>(.*))<\/firstName>\s<lastName>(?P<lname>(.*))<\/lastName><\/employee>

For JSON:

Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>\S+)\",\s\"lastName\":\"(?P<lname>\S+)\"([\s\S\w\W]+))

or

Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>(.*))\",\s\"lastName\":\"(?P<lname>(.*))\"([\s\S\w\W]+))

Now where to put the above string in Splunk, it should be after you have specified your search criteria and then

| rex field= _raw " your regular expression string comes between this inverted commas" | stats count by fname

source=lambda "* employee*" | rex field= _raw " Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>\S+)\",\s\"lastName\":\"(?P<lname>\S+)\"([\s\S\w\W]+)) | stats count by fname

| rex field= _raw - > this is how you specify you are starting a regular expression on the raw event in Splunk.

Below is the link of Splunk original documentation for using regular expression in Splunk

I hope the above article helps you out in starting with regular expressions in Splunk. Please comment if you have any queries.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
splunk ,regex ,big data ,regular expressions

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}