DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics
  • Operationalizing Enterprise AI at Scale: Architecture, Governance, and Adoption
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering

Trending

  • Minimus Expands Enterprise Security Platform with General Availability of Advanced Supply Chain Controls
  • Token Attribution Framework for Agentic AI in CI/CD
  • Securing the AI Host: Spring AI MCP Server Communication With API Keys
  • MuleSoft MCP and A2A in Production: What 17 Recipes Reveal
  1. DZone
  2. Data Engineering
  3. Data
  4. Starting With Regular Expressions in Splunk

Starting With Regular Expressions in Splunk

In this post, you will to learn how to write RegEX for Splunk, one of the most widely used platforms for data monitoring and analysis.

By 
Rajat Pareek user avatar
Rajat Pareek
·
Jul. 11, 18 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
45.6K Views

Join the DZone community and get the full member experience.

Join For Free

Splunk is one of the most widely used platforms for data monitoring and analysis, it provides various index and search patterns to get your desired data and arrange it in a tabular format by using the stats, chart, or table inbuilt features of Splunk. They are quite easy to use when you have the raw event data aligned in a proper format and the required data values are tagged to a defined field in Splunk.

However, scenarios change when the data in the raw event is not in a specific format, some part of the data is tagged to a field, and some part of the data is a plain XML or JSON payload being clubbed in with other data like time details or unique id values (like below). So, creating a new field each time for a specific set of data out of the XML tag or JSON key-value pair becomes a hectic task and one might not know how it would affect some other raw events during the search which are not identical to the event out of which the field was extracted, hence it might not fetch the desired result of every search.

XML along with:

source=proboo Time=2018-02-20T12:00:30 transactionID=254g Payload=<employees>
<employee>
<firstName>John</firstName> <lastName>Doe</lastName></employee>
<employee><firstName>Anna</firstName> <lastName>Smith</lastName>
</employee><employee><firstName>Peter</firstName> <lastName>Jones</lastName></employee></employees>

JSON along with data:

source=proboo Time=2018-02-20T12:00:30 transactionID=254g Payload={"employees":[
{ "firstName":"John", "lastName":"Doe" },
{ "firstName":"Anna", "lastName":"Smith" },
{ "firstName":"Peter", "lastName":"Jones" }
]}

So, in the case of the above scenarios, where the raw event is a mixture of some data tagged field clubbed with an XML/JSON payload, then regular expressions can be written with the search string to easily fetch the desired values out of the XML or JSON, along with the other data, and get them aligned in a tabular format.

But, learning to write a regular expression is not that easy, so where should you start? How are you to know if you are on the right path or not? Testing the data after writing a single set of patterns, so that nothing is missed, is not that easy. I found it quite difficult and had to google a lot when I came across a way to proceed with learning to write a basic regular expression, which I can use in Splunk to get my desired data out of the above scenarios. So, I am writing this article so that I can share my experience on how I started with it.

So, before learning regular expressions, one needs to simply understand how to write a simplified expression which is identical to their set of data, as everything in a regular expression is a character, and we need to write a pattern which would match the exact sequence of characters as it is in our data. In our day-to-day use, data upon which we need to act contains letters, numbers, some special characters (%#$@!), or some basic tag types. We need to use this only to form a pattern on the whole dataset, which in turns will result in our regular expression and can be used in Splunk along with the search string.

I am sharing the below link, which I found while searching to learn regular expressions. This website will provide you with insights into the various pattern types used to write regular expressions along with some hands-on experience as well.

https://regexone.com

Go through the first 15 chapters clearly, try to match the whole pattern specified in the hands-on part, try to explore different patterns and check out the solution part as well. First, 15 will give you a basic understanding of regular expression. Now, after exploring the above link, copy one of the above examples and go to the below website, this is one of the best Regex engines I came across for testing regular expressions to be used in Splunk.

Note: Please change the RegEx engine to PCRE server as it allows grouping features,

https://regexr.com/

Image title

Below is an image showing how this engine looks.

It is divided into three parts:

  1. The expression writing part.

  2. The data part where you need to put the data to be acted upon.

  3. Tools or line character wise explanation of your expression (this will help you to learn more about working with the regular expression).

Image title

You don't need to start writing the expression, starting from the first character in the data set (until and unless you have not specified your search keywords in the search part of the Splunk query clearly) if you already have the data specified in a Splunk field.

Try out the below expressions and see the result, try to explore more on it. Start by writing one character from the below expression at a time and see the part of the dataset which gets highlighted as a result of the query string that you wrote down. The below pattern is all you went through the above Regular expression learning website.

Payload=([\s\S\w\W])

Payload=([\s\S\w\W]+)

Now we will learn how to get the first name and how to start with grouping in regular expressions.

So, first, we need to know what grouping is in terms of regular expressions. Grouping helps us to extract exact information out of a bigger match context in the data set. Now, in Splunk, we can easily use this group of names to extract the data we need by feeding them to stats, charts, etc.

The syntax for groping is:

(?P<name>Value)

This name can be anything of your choice, try to make it identical to the value you are trying to fetch.

Below is a part of the RegEx string used for extracting the first name and the last name, out of the above XML and JSOn payload.

For XML type:

Payload=([\s\S\w\W]+)<employee><firstName>(?P<fname>\S+)
<\/firstName>\s<lastName>(?P<lname>\S+)<\/lastName><\/employee>

Breakdown of the above pattern:

([\s\S\w\W]+) - selects the whole dataset after "Payload="

>(?P<fname>\S+)<\ - This pattern extracts all the values between the fistname starting and ending tag and stored it in fname.

\S + or (.*)- selects the value.

or

Payload=([\s\S\w\W]+)<employee><firstName>(?P<fname>(.*))<\/firstName>\s<lastName>(?P<lname>(.*))<\/lastName><\/employee>

For JSON:

Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>\S+)\",\s\"lastName\":\"(?P<lname>\S+)\"([\s\S\w\W]+))

or

Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>(.*))\",\s\"lastName\":\"(?P<lname>(.*))\"([\s\S\w\W]+))

Now where to put the above string in Splunk, it should be after you have specified your search criteria and then

| rex field= _raw " your regular expression string comes between this inverted commas" | stats count by fname

source=lambda "* employee*" | rex field= _raw " Payload=(?P<Fullpayload>([\s\S\w\W]+)\"firstName\":\"(?P<fname>\S+)\",\s\"lastName\":\"(?P<lname>\S+)\"([\s\S\w\W]+)) | stats count by fname

| rex field= _raw - > this is how you specify you are starting a regular expression on the raw event in Splunk.

Below is the link of Splunk original documentation for using regular expression in Splunk

  • Splunk docs

I hope the above article helps you out in starting with regular expressions in Splunk. Please comment if you have any queries.

Data (computing)

Opinions expressed by DZone contributors are their own.

Related

  • Stop Loading Everything into Redshift: A Spectrum + Iceberg Pattern for Hybrid Analytics
  • Operationalizing Enterprise AI at Scale: Architecture, Governance, and Adoption
  • Why Round-Robin Won't Save You: Load Balancing Challenges in Data Streaming Services With Heterogeneous Traffic
  • Good Data, Bad Metric: A Mutation Testing Pattern for Analytics Engineering

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook