DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • Docker + .NET APIs: Simplifying Deployment and Scaling
  • Using .NET Core in Jupyter Notebook
  • How to Properly Dispose of Resources In .NET Core
  • Database Query Service With OpenAI and PostgreSQL in .NET

Trending

  • Subtitles: The Good, the Bad, and the Resource-Heavy
  • Scaling Mobile App Performance: How We Cut Screen Load Time From 8s to 2s
  • How to Build Scalable Mobile Apps With React Native: A Step-by-Step Guide
  • Kubeflow: Driving Scalable and Intelligent Machine Learning Systems
  1. DZone
  2. Coding
  3. Languages
  4. Regular Expressions With C# and .NET 7

Regular Expressions With C# and .NET 7

This article takes you step-by-step through creating a console app to explore regular expressions via some cool new .NET 7 features.

By 
Mark Price user avatar
Mark Price
·
Dec. 07, 22 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
11.5K Views

Join the DZone community and get the full member experience.

Join For Free

This article is an (adapted) excerpt from the book C# 11 and .NET 7 – Modern Cross-Platform Development Fundamentals, and takes you step-by-step through creating a console app to explore regular expressions via some cool new .NET 7 features: the [StringSyntax] attribute and source-generated regular expressions.

Pattern Matching With Regular Expressions

Regular expressions are useful for validating input from the user. They are very powerful and can get very complicated. Almost all programming languages have support for regular expressions and use a common set of special characters to define them. Let's try out some example regular expressions.

Use your preferred code editor to add a new Console App/console project named WorkingWithRegularExpressions to your solution/workspace.

In Program.cs, delete the existing statements, and then import the following namespace:

 
using System.Text.RegularExpressions; // Regex


Checking for Digits Entered as Text

We will start by implementing the common example of validating number input.

In Program.cs, add statements to prompt the user to enter their age and then check that it is valid using a regular expression that looks for a digit character, as shown in the following code:

 
Write("Enter your age: "); 
string input = ReadLine()!; // null-forgiving

Regex ageChecker = new(@"\d"); 

if (ageChecker.IsMatch(input))
{
  WriteLine("Thank you!");
}
else
{
  WriteLine($"This is not a valid age: {input}");
}


Note the following about the code:

  • The @ character switches off the ability to use escape characters in the string. Escape characters are prefixed with a backslash. For example, \t means a tab and \n means a new line. When writing regular expressions, we need to disable this feature. To paraphrase the television show The West Wing, "Let backslash be backslash."
  • Once escape characters are disabled with @, then they can be interpreted by a regular expression. For example, \d means digit. 

Run the code, enter a whole number such as 34 for the age, and view the result, as shown in the following output:

 
Enter your age: 34 
Thank you!


Run the code again, enter carrots, and view the result, as shown in the following output:

 
Enter your age: carrots
This is not a valid age: carrots


Run the code again, enter bob30smith, and view the result, as shown in the following output:

 
Enter your age: bob30smith
Thank you!


The regular expression we used is \d, which means one digit. However, it does not specify what can be entered before and after that one digit. This regular expression could be described in English as "Enter any characters you want as long as you enter at least one digit character."

In regular expressions, you indicate the start of some input with the caret ^ symbol and the end of some input with the dollar $ symbol. Let's use these symbols to indicate that we expect nothing else between the start and end of the input except for a digit.

Change the regular expression to ^\d$, as shown in the following code:

 
Regex ageChecker = new(@"^\d$");


Run the code again and note that it rejects any input except a single digit. We want to allow one or more digits. To do this, we add a + after the \d expression to modify the meaning to one or more.

Change the regular expression, as shown in the following code:

 
Regex ageChecker = new(@"^\d+$");


Run the code again and note the regular expression only allows zero or positive whole numbers of any length.

Regular Expression Performance Improvements

The .NET types for working with regular expressions are used throughout the .NET platform and many of the apps built with it. As such, they have a significant impact on performance, but until now, they have not received much optimization attention from Microsoft.

With .NET 5 and later, the System.Text.RegularExpressions namespace has rewritten internals to squeeze out maximum performance. Common regular expression benchmarks using methods like IsMatch are now five times faster. And the best thing is, you do not have to change your code to get the benefits!

With .NET 7 and later, the IsMatch method of the Regex class now has an overload for a ReadOnlySpan<char> as its input, which gives even better performance.

Splitting a Complex Comma-Separated String

Let's consider how we would split a complex string, like the following example of film titles:

 
"Monsters, Inc.","I, Tonya","Lock, Stock and Two Smoking Barrels"


The string value uses double quotes around each film title. We can use these to identify whether we need to split on a comma (or not). The Split method is not powerful enough, so we can use a regular expression instead.

You can read a fuller explanation in this Stack Overflow article that inspired this task.

To include double quotes inside a string value, we prefix them with a backslash, or we could use the C# 11 raw string literal feature in C# 11 or later.

Add statements to store a complex comma-separated string variable, and then split it in a dumb way using the Split method, as shown in the following code:

 
// C# 1 to 10: Use escaped double-quote characters \"
// string films = "\"Monsters, Inc.\",\"I, Tonya\",\"Lock, Stock and Two Smoking Barrels\"";

// C# 11 or later: Use """ to start and end a raw string literal
string films = """
"Monsters, Inc.","I, Tonya","Lock, Stock and Two Smoking Barrels"
""";

WriteLine($"Films to split: {films}");

string[] filmsDumb = films.Split(',');

WriteLine("Splitting with string.Split method:"); 
foreach (string film in filmsDumb)
{
  WriteLine(film);
}


Add statements to define a regular expression to split and write the film titles in a smart way, as shown in the following code:

 
Regex csv = new(
  "(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)");

MatchCollection filmsSmart = csv.Matches(films);

WriteLine("Splitting with regular expression:"); 
foreach (Match film in filmsSmart)
{
  WriteLine(film.Groups[2].Value);
}


In the last section, you will see how you can get a source generator to auto-generate XML comments for a regular expression to explain how it works. This is really useful for regular expressions that you might have copied from a website.

Run the code and view the result, as shown in the following output:

 
Splitting with string.Split method: 
"Monsters
 Inc." 
"I
 Tonya" 
"Lock
 Stock and Two Smoking Barrels" 
Splitting with regular expression: 
Monsters, Inc.
I, Tonya
Lock, Stock and Two Smoking Barrels


Activating Regular Expression Syntax Coloring

If you use Visual Studio 2022 as your code editor, then you probably noticed that when passing a string value to the Regex constructor, you see color syntax highlighting, as shown below:

Regular expression color syntax highlighting when using the Regex constructor

Regular expression color syntax highlighting when using the Regex constructor

Why does this string get syntax coloring for regular expressions when most string values do not? Let's find out.

Right-click in the new constructor, select Go To Implementation, and note the string parameter named pattern is decorated with an attribute named StringSyntax that has the string constant Regex value passed to it, as shown in the following code:

 
public Regex([StringSyntax(StringSyntaxAttribute.Regex)] string pattern) :
  this(pattern, culture: null)
{
}


Right-click in the StringSyntax attribute, select Go To Implementation, and note there are 12 recognized string syntax formats that you can choose from as well as Regex, as shown in the following partial code:

 
[AttributeUsage(AttributeTargets.Property | AttributeTargets.Field | AttributeTargets.Parameter, AllowMultiple = false, Inherited = false)]
public sealed class StringSyntaxAttribute : Attribute
{
  public const string CompositeFormat = "CompositeFormat";
  public const string DateOnlyFormat = "DateOnlyFormat";
  public const string DateTimeFormat = "DateTimeFormat";
  public const string EnumFormat = "EnumFormat";
  public const string GuidFormat = "GuidFormat";
  public const string Json = "Json";
  public const string NumericFormat = "NumericFormat";
  public const string Regex = "Regex";
  public const string TimeOnlyFormat = "TimeOnlyFormat";
  public const string TimeSpanFormat = "TimeSpanFormat";
  public const string Uri = "Uri";
  public const string Xml = "Xml";
  …
}


In the WorkingWithRegularExpressions project, add a new class file named Program.Strings.cs, and modify its content to define some string constants, as shown in the following code:

 
partial class Program
{
  const string digitsOnlyText = @"^\d+$";

  const string commaSeparatorText = 
    "(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)";
}


Note that the two string constants do not have any color syntax highlighting yet.

In Program.cs, replace the literal string with the string constant for the digits-only regular expression, as shown in the following code:

 
Regex ageChecker = new(digitsOnlyText);


In Program.cs, replace the literal string with the string constant for the comma separator regular expression, as shown in the following code:

 
Regex csv = new(commaSeparatorText);


Run the console app and confirm that the regular expression behavior is as before.

In Program.Strings.cs, import the namespace for the [StringSyntax] attribute and then decorate both string constants with it, as shown in the following code:

 
using System.Diagnostics.CodeAnalysis; // [StringSyntax]

partial class Program
{
  [StringSyntax(StringSyntaxAttribute.Regex)]
  const string digitsOnlyText = @"^\d+$";

  [StringSyntax(StringSyntaxAttribute.Regex)]
  const string commaSeparatorText = 
    "(?:^|,)(?=[^\"]|(\")?)\"?((?(1)[^\"]*|[^,\"]*))\"?(?=,|$)";
}


In Program.Strings.cs, add another string constant for formatting a date, as shown in the following code:

 
[StringSyntax(StringSyntaxAttribute.DateTimeFormat)]
const string fullDateTime = "";


Click inside the empty string, type a letter d, and note the IntelliSense, as shown below:

IntelliSense activated due to the StringSyntax attribute

IntelliSense activated due to the StringSyntax attribute

Finish entering the date format and as you type note the IntelliSense: dddd, d MMMM yyyy.

Add at the end of the digitsOnlyText string literal, add a \, and note the IntelliSense to help you write a valid regular expression, as shown below:

IntelliSense for writing a regular expression

IntelliSense for writing a regular expression

The [StringSyntax] attribute is a new feature introduced in .NET 7. It is up to your code editor to recognize it. .NET 7 libraries have more than 350 parameters, properties, and fields that are now decorated with this attribute.

Improving Regular Expression Performance With Source Generators

When you pass a string literal or string constant to the constructor of Regex, the class parses the string and transforms it into an internal tree structure that represents the expression in an optimized way that can be executed efficiently by a regular expression interpreter.

You can also compile regular expressions by specifying a RegexOption, as shown in the following code:

 
Regex ageChecker = new(digitsOnlyText, RegexOptions.Compiled);


Unfortunately, compiling has the negative effect of slowing down the initial creation of the regular expression. After creating the tree structure that would then be executed by the interpreter, the compiler then has to convert the tree into IL code, and then that IL code needs to be JIT compiled into native code. If you only run the regular expression a few times, it is not worth compiling it, which is why it is not the default behavior.

.NET 7 introduces a source generator for regular expressions which recognizes if you decorate a partial method that returns Regex with the [GeneratedRegex] attribute. It generates an implementation of that method which implements the logic for the regular expression.

Let's see it in action.

In the WorkingWithRegularExpressions project, add a new class file named Program.Regexs.cs, and modify its content to define some partial methods, as shown in the following code:

 
using System.Text.RegularExpressions; // [GeneratedRegex]

partial class Program
{
  [GeneratedRegex(digitsOnlyText, RegexOptions.IgnoreCase)]
  private static partial Regex DigitsOnly();

  [GeneratedRegex(commaSeparatorText, RegexOptions.IgnoreCase)]
  private static partial Regex CommaSeparator();
}


In Program.cs, replace the new constructor with a call to the partial method that returns the digits-only regular expression, as shown in the following code:

 
Regex ageChecker = DigitsOnly();


In Program.cs, replace the new constructor with a call to the partial method that returns the comma separator regular expression, as shown in the following code:

 
Regex csv = CommaSeparator();


Hover your mouse pointer over the partial methods and note that the tooltip describes the behavior of the regular expression, as shown below:

Tooltip for a partial method shows a description of the regular expression

Tooltip for a partial method shows a description of the regular expression

Right-click the DigitsOnly partial method, select Go To Definition, and note that you can review the implementation of the auto-generated partial methods, as shown below:

The auto-generated source code for the regular expression

The auto-generated source code for the regular expression

Run the console app and confirm that the functionality is the same as before.

You can learn more about the improvements to regular expressions with .NET 7.

Net (command) csharp .NET

Published at DZone with permission of Mark Price. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Docker + .NET APIs: Simplifying Deployment and Scaling
  • Using .NET Core in Jupyter Notebook
  • How to Properly Dispose of Resources In .NET Core
  • Database Query Service With OpenAI and PostgreSQL in .NET

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!