{{announcement.body}}
{{announcement.title}}

Parsing in C#: All the Tools and Libraries You Can Use (Part 3)

DZone 's Guide to

Parsing in C#: All the Tools and Libraries You Can Use (Part 3)

We wrap up this three part series by looking at parser combinators, and a few tools that you should, and shouldn't, use in development.

· Web Dev Zone ·
Free Resource

Welcome back! If you missed the first two parts, you can check them out here: Part 1; Part 2.

Parser Combinators

Parser combinators allow you to create a parser simply with C# code, by combining different pattern matching functions that are equivalent to grammar rules. They are generally considered to be best suited for simpler parsing needs. Given that they are just C# libraries, you can easily introduce them into your project: you do not need any specific generation step and you can write all of your code in your favorite editor. Their main advantage is the possibility of being integrated in your traditional workflow and IDE.

In practice, this means that they are very useful for all the little parsing problems you find. If the typical developer encounters a problem that is too complex for a simple regular expression, these libraries are usually the solution. In short, if you need to build a parser, but you don't actually want to, a parser combinator may be your best option.

Sprache and Superpower

Sprache is a simple, lightweight library for constructing parsers directly in C# code.

It doesn't compete with "industrial strength" language workbenches — it fits somewhere in between regular expressions and a full-featured toolset like ANTLR.

The documentation says it all, except how to use it. You can understand how to use it by mostly reading tutorials, including one we have written for Sprache. However, it is quite popular and cited in the credits for ReSharper.

There is no grammar, you use the functions it provides as you would for normal code.

// from our tutorial. Command is just a class of our program
public static Parser<Command> Command = Parse.Char('<').Then(_ => Parse.Char('>'))
                                                .Return(SpracheGameCore.Command.Between)
                                            .Or(Parse.Char('<')
                                                .Return(SpracheGameCore.Command.Less))
                                            .Or(Parse.Char('>')
                                                .Return(SpracheGameCore.Command.Greater))
                                            .Or(Parse.Char('=')
                                                .Return(SpracheGameCore.Command.Equal));

// another example of the nice LINQ-like syntax for combining parser functions
public static Parser<Play> Play =
        (from action in Command
        from value in Number
        select new Play(action, value, null))
    .Or(from firstValue in Number
        from action in Command
        from secondValue in Number
        select new Play(action, firstValue, secondValue));

A parser combinator library based on Sprache. Superpower generates friendlier error messages through its support for token-based parsers.

Superpower comes from the same author and it is a slightly more advanced tool with an equal lack of documentation. Being newer there are also no tutorials.

Both Sprache and Superpower support .NET Standard 1.0.

Parseq, Parsley, and LanguageExt.Parsec

These are three ports of the famous Parsec Library in Haskell. They all have some reasons to chose one over the other.

Parseq is a monadic parser combinator library written for C#. It can parse context-sensitive, infinite-lookahead grammers.

Parseq seems to be a straight port of Haskell. But there is no documentation, so if you know how to use Parsec it might be a good choice, otherwise you are on your own.

Parsley is a monadic parser combinator library inspired by Haskell's Parsec and F#'s FParsec. It can parse context-sensitive, infinite look-ahead grammars but it performs best on predictive (LL[1]) grammars.

Parsley is a parser combinator, but it has a separate lexer and parser phase. In practical terms, this means that it is simple to use, but it is also familiar to experienced creators of parsers. There is a limited amount of documentation, but a complete JSON example used as an integration test.

Parsley supports .NET Standard 1.1.

// an example from the documentation
var text = new Text("1 2 3 a b c");
var lexer = new Lexer(new Pattern("letter", @"[a-z]"),
                      new Pattern("number", @"[0-9]+"),
                      new Pattern("whitespace", @"\s+", skippable: true));

// in real usage you are probably going to use a LINQ-like syntax to get the tokens
Token[] tokens = lexer.ToArray();

The Parsec library is almost an exact replica of the Haskell Parsec library. It can be used to parse very simple blocks of text up to entire language parsers.

LanguageExt.Parsec is a port of the Haskell library in a larger library designed to bring functional features in C# 6. There is a minimum amount of documentation to get you started.

LanguageExt.Parsec supports .NET Standard 1.3.

// example from the documentation
var spaces = many(satisfy(Char.IsWhiteSpace));
var word = from w in many1(letter)   // letter = satisfy(Char.IsLetter)
           from s in spaces
           select w;
var parser = many1(word);
var result = parse(parser, "two words");

It is obviously the best choice if you also need a bit of F# in your C#, but is quite good on its own.

Pidgin

Pidgin is a parser combinator library, a lightweight, high-level, declarative tool for constructing parsers.

Pidgin is a new parser combinator library that is already quite mature and useful. Like Sprache, it is easy to use and supports a nice LINQ-like syntax. It also has a few advantages over Sprache: it is more actively maintained, is faster, consumes less memory, supports binary input, and includes support for advanced features such as recursive structures or operator precedence.

Recursive structures are made possible by a specific operator that allows you to defer the execution of a parser to another section of the code. The operator precedence is managed with a class made to deal with expressions.

The following is a partial JSON example from the repository.

public static class JsonParser
{
    private static readonly Parser<char, char> LBrace = Char('{');
    [..]
    private static readonly Parser<char, char> ColonWhitespace =
        Colon.Between(SkipWhitespaces);

    [..]

    private static readonly Parser<char, IJson> Json =
        JsonString.Or(Rec(() => JsonArray)).Or(Rec(() => JsonObject));

    private static readonly Parser<char, IJson> JsonArray = 
        Json.Between(SkipWhitespaces)
            .Separated(Comma)
            .Between(LBracket, RBracket)
            .Select<IJson>(els => new JsonArray(els.ToImmutableArray()));

 [..]
}

The documentation is good and covers many aspects: a tutorial/reference, suggestions to speed up your code, and a comparison with other parser combinator libraries. The repository also contains examples on JSON and XML. The tutorial/reference is not as deep as one would like, but it gets you started. The author also gave a talk at NDC that includes a tutorial about Pidgin.

Best Way to Parse C#: Roslyn

Roslyn provides open-source C# and Visual Basic compilers with rich code analysis APIs. It enables building code analysis tools with the same APIs that are used by Visual Studio.

There is one special case that could be managed in a more specific way: the case in which you want to parse C# code in C#. In such cases, you should use the .NET Compiler Platform, which it is a compiler as a service, better known as Roslyn. It is open source and also the official C# parser, so there is no better choice.

In practical terms, it works as a library that you can use to parse C#, but also to generate C# and do everything a compiler can do. The only weak point may be the abundant, but somewhat badly organized documentation. Luckily you can read a few tutorials we have written for Roslyn.

Tools That We Cannot Recommend

We want to also list some tools that people usually mention and are interesting, but we could not include in this analysis for several reasons.

Irony

Irony is a development kit for implementing languages on the .NET platform.

Irony is a parser generator that does not rely on a grammar, but on overloading operators in C# to express grammar constructs. It also includes an interpreter. It has not been updated since a 2013 beta release and it does not seem that it ever had a stable version. Although there is a recently modified version that supports .NET Core.

GOLD

GOLD is a free parsing system that is designed to support multiple programming languages.

In practical terms, it is an IDE that supports the creation of BNF grammars to generate parsers in many languages, including Assembly, C, C#, D, Java, Pascal, Python, Visual Basic, .NET, and Visual C++. It has been relevant enough to have its own wikipedia article, but it is not updated since 2012.

TinyPG

Tiny Parser Generator is an interesting tool presented in a popular CodeProject article that also spawned a fork. It is a tool with a simple IDE that can generate lexer, scanner, and parse tree representation. But it can also generate a syntax highlighter for a text box. It is neat, but we cannot recommend it because it was never really meant for professional use and it is not updated anymore.

Summary

As we said in the sister article about parsing in Java, the world of parsers is a bit different from the usual world of programmers. That is because a lot of good tools come directly from academia, and, in that sector, Java is more popular than C#. So there are fewer parsing toosl for C# than for Java. Also, some, like ANTLR, are written in Java, but can produce C# code. This does not mean that there are not good options, but there are fewer of them.

In fact, if you need a complete parser generator for a .NET Core project your only option is using ANTLR. Though you have more choices available for parser combinators.

On the other hand, if you need to parse C# you have the chance to use the official compiler very easily, so that is a plus.

We cannot really say what software you should use. What it is best for a user might not be the best for somebody else. And we all know that the most technically correct solution might not be ideal in real life with all its constraints. So we wanted to share what we have learned on the best options for parsing in C#.

Thanks to Lee Humphries for his feedback on this article and Benjamin Hodgson for having signalled to us Pidgin.

Topics:
web dev ,c# ,parsing combinators ,parsing ,developer tools

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}