DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Data
  4. String Split and Join With Escaping

String Split and Join With Escaping

Anders Abel talks escape characters and regular expressions.

Anders Abel user avatar by
Anders Abel
·
Jun. 17, 16 · Tutorial
Like (2)
Save
Tweet
Share
4.29K Views

Join the DZone community and get the full member experience.

Join For Free

.NET offers the simple string.Split() and string.Join() methods for joining and splitting separated strings. But what if there is no suitable separator character that would occur in the string? Then the separator character must be escaped. And then the escape character must be escaped too… And this turns out to be quite an interesting algorithm to write.

I thought that this functionality would be built-in, but as far as I could find out it isn’t. If there is a built-in way, please leave a comment to educate me. This being a string manipulation, there is a possibility to use Regular Expressions too, but…

Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.

Jamie Zawinski

Solving this through a Regular Expression would require some black magic double look-behind assertion which I wouldn’t understand even when I wrote the code, much less later when I came back to fix some bug. So I went for implementing it myself.

Image title

Design Considerations

This is just a small helper that I’m writing as part of a bigger project. It is not performance critical, so I haven’t spent any time optimizing it. But I did think a bit about performance implications. One approach that I directly ruled out was to build up the split strings character by character when looping through the input string. It would make the implementation quite easy to follow, but would allocate a new string for each char being checked in the source string. That is a bit too much pressure on the garbage collector for my taste.

So I went for an iterative approach where I loop through the string, keeping track of where the current segment started and checking if the end of the segment has been found. I think that the resulting code is fairly readable. But it is more complex with more quirks than I first imagined because of some edge cases. With the delimiter being , and the escape character / consider the following escaped and joined strings:

  • aa,bb,cc
  • ,aa,,bb,
  • a/,b//c,/,,//,
  • a/,

Strings can be empty – even the final one. They can end with an escape sequence. And an escaped escape character can precede a delimiter where the string should be split.

The Code

/// <summary>
/// Helpers for delimited string, with support for escaping the delimiter
/// character.
/// </summary>
public static class DelimitedString
{
  const string DelimiterString = ",";
  const char DelimiterChar = ',';

  // Use a single / as escape char, avoid \ as that would require
  // all escape chars to be escaped in the source code...
  const char EscapeChar = '/';
  const string EscapeString = "/";

  /// <summary>
  /// Join strings with a delimiter and escape any occurence of the
  /// delimiter and the escape character in the string.
  /// </summary>
  /// <param name="strings">Strings to join</param>
  /// <returns>Joined string</returns>
  public static string Join(params string[] strings)
  {
    return string.Join(
      DelimiterString,
      strings.Select(
        s => s
        .Replace(EscapeString, EscapeString + EscapeString)
        .Replace(DelimiterString, EscapeString + DelimiterString)));
  }

  /// <summary>
  /// Split strings delimited strings, respecting if the delimiter
  /// characters is escaped.
  /// </summary>
  /// <param name="source">Joined string from <see cref="Join(string[])"/></param>
  /// <returns>Unescaped, split strings</returns>
  public static string[] Split(string source)
  {
    var result = new List<string>();

    int segmentStart = 0;
    for (int i = 0; i < source.Length; i++)
    {
      bool readEscapeChar = false;
      if (source[i] == EscapeChar)
      {
        readEscapeChar = true;
        i++;
      }

      if (!readEscapeChar && source[i] == DelimiterChar)
      {
        result.Add(UnEscapeString(
          source.Substring(segmentStart, i - segmentStart)));
        segmentStart = i + 1;
      }

      if (i == source.Length - 1)
      {
        result.Add(UnEscapeString(source.Substring(segmentStart)));
      }
    }

    return result.ToArray();
  }

  static string UnEscapeString(string src)
  {
    return src.Replace(EscapeString + DelimiterString, DelimiterString)
      .Replace(EscapeString + EscapeString, EscapeString);
  }
}

The code is part of Kentor.AuthServices and also available at GitHub. The code is covered by tests.

Strings Data Types

Published at DZone with permission of Anders Abel, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • 5 Factors When Selecting a Database
  • Iptables Basic Commands for Novice
  • Distributed SQL: An Alternative to Database Sharding
  • OpenID Connect Flows

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: