Parsing in C#: All the Tools and Libraries You Can Use (Part 3)
We wrap up this three part series by looking at parser combinators, and a few tools that you should, and shouldn't, use in development.
Join the DZone community and get the full member experience.
Join For FreeWelcome back! If you missed the first two parts, you can check them out here: Part 1; Part 2.
Parser Combinators
Parser combinators allow you to create a parser simply with C# code, by combining different pattern matching functions that are equivalent to grammar rules. They are generally considered to be best suited for simpler parsing needs. Given that they are just C# libraries, you can easily introduce them into your project: you do not need any specific generation step and you can write all of your code in your favorite editor. Their main advantage is the possibility of being integrated in your traditional workflow and IDE.
In practice, this means that they are very useful for all the little parsing problems you find. If the typical developer encounters a problem that is too complex for a simple regular expression, these libraries are usually the solution. In short, if you need to build a parser, but you don't actually want to, a parser combinator may be your best option.
Sprache and Superpower
The documentation says it all, except how to use it. You can understand how to use it by mostly reading tutorials, including one we have written for Sprache. However, it is quite popular and cited in the credits for ReSharper.
There is no grammar, you use the functions it provides as you would for normal code.
// from our tutorial. Command is just a class of our program
public static Parser<Command> Command = Parse.Char('<').Then(_ => Parse.Char('>'))
.Return(SpracheGameCore.Command.Between)
.Or(Parse.Char('<')
.Return(SpracheGameCore.Command.Less))
.Or(Parse.Char('>')
.Return(SpracheGameCore.Command.Greater))
.Or(Parse.Char('=')
.Return(SpracheGameCore.Command.Equal));
// another example of the nice LINQ-like syntax for combining parser functions
public static Parser<Play> Play =
(from action in Command
from value in Number
select new Play(action, value, null))
.Or(from firstValue in Number
from action in Command
from secondValue in Number
select new Play(action, firstValue, secondValue));
Superpower comes from the same author and it is a slightly more advanced tool with an equal lack of documentation. Being newer there are also no tutorials.
Both Sprache and Superpower support .NET Standard 1.0.
Parseq, Parsley, and LanguageExt.Parsec
These are three ports of the famous Parsec Library in Haskell. They all have some reasons to chose one over the other.
Parseq seems to be a straight port of Haskell. But there is no documentation, so if you know how to use Parsec it might be a good choice, otherwise you are on your own.
Parsley is a parser combinator, but it has a separate lexer and parser phase. In practical terms, this means that it is simple to use, but it is also familiar to experienced creators of parsers. There is a limited amount of documentation, but a complete JSON example used as an integration test.
Parsley supports .NET Standard 1.1.
// an example from the documentation
var text = new Text("1 2 3 a b c");
var lexer = new Lexer(new Pattern("letter", @"[a-z]"),
new Pattern("number", @"[0-9]+"),
new Pattern("whitespace", @"\s+", skippable: true));
// in real usage you are probably going to use a LINQ-like syntax to get the tokens
Token[] tokens = lexer.ToArray();
LanguageExt.Parsec is a port of the Haskell library in a larger library designed to bring functional features in C# 6. There is a minimum amount of documentation to get you started.
LanguageExt.Parsec supports .NET Standard 1.3.
// example from the documentation
var spaces = many(satisfy(Char.IsWhiteSpace));
var word = from w in many1(letter) // letter = satisfy(Char.IsLetter)
from s in spaces
select w;
var parser = many1(word);
var result = parse(parser, "two words");
It is obviously the best choice if you also need a bit of F# in your C#, but is quite good on its own.
Pidgin
Pidgin is a parser combinator library, a lightweight, high-level, declarative tool for constructing parsers.
Pidgin is a new parser combinator library that is already quite mature and useful. Like Sprache, it is easy to use and supports a nice LINQ-like syntax. It also has a few advantages over Sprache: it is more actively maintained, is faster, consumes less memory, supports binary input, and includes support for advanced features such as recursive structures or operator precedence.
Recursive structures are made possible by a specific operator that allows you to defer the execution of a parser to another section of the code. The operator precedence is managed with a class made to deal with expressions.
The following is a partial JSON example from the repository.
public static class JsonParser
{
private static readonly Parser<char, char> LBrace = Char('{');
[..]
private static readonly Parser<char, char> ColonWhitespace =
Colon.Between(SkipWhitespaces);
[..]
private static readonly Parser<char, IJson> Json =
JsonString.Or(Rec(() => JsonArray)).Or(Rec(() => JsonObject));
private static readonly Parser<char, IJson> JsonArray =
Json.Between(SkipWhitespaces)
.Separated(Comma)
.Between(LBracket, RBracket)
.Select<IJson>(els => new JsonArray(els.ToImmutableArray()));
[..]
}
The documentation is good and covers many aspects: a tutorial/reference, suggestions to speed up your code, and a comparison with other parser combinator libraries. The repository also contains examples on JSON and XML. The tutorial/reference is not as deep as one would like, but it gets you started. The author also gave a talk at NDC that includes a tutorial about Pidgin.
Best Way to Parse C#: Roslyn
There is one special case that could be managed in a more specific way: the case in which you want to parse C# code in C#. In such cases, you should use the .NET Compiler Platform, which it is a compiler as a service, better known as Roslyn. It is open source and also the official C# parser, so there is no better choice.
In practical terms, it works as a library that you can use to parse C#, but also to generate C# and do everything a compiler can do. The only weak point may be the abundant, but somewhat badly organized documentation. Luckily you can read a few tutorials we have written for Roslyn.
Tools That We Cannot Recommend
We want to also list some tools that people usually mention and are interesting, but we could not include in this analysis for several reasons.
Irony
Irony is a development kit for implementing languages on the .NET platform.
Irony is a parser generator that does not rely on a grammar, but on overloading operators in C# to express grammar constructs. It also includes an interpreter. It has not been updated since a 2013 beta release and it does not seem that it ever had a stable version. Although there is a recently modified version that supports .NET Core.
GOLD
In practical terms, it is an IDE that supports the creation of BNF grammars to generate parsers in many languages, including Assembly, C, C#, D, Java, Pascal, Python, Visual Basic, .NET, and Visual C++. It has been relevant enough to have its own wikipedia article, but it is not updated since 2012.
TinyPG
Tiny Parser Generator is an interesting tool presented in a popular CodeProject article that also spawned a fork. It is a tool with a simple IDE that can generate lexer, scanner, and parse tree representation. But it can also generate a syntax highlighter for a text box. It is neat, but we cannot recommend it because it was never really meant for professional use and it is not updated anymore.
Summary
As we said in the sister article about parsing in Java, the world of parsers is a bit different from the usual world of programmers. That is because a lot of good tools come directly from academia, and, in that sector, Java is more popular than C#. So there are fewer parsing toosl for C# than for Java. Also, some, like ANTLR, are written in Java, but can produce C# code. This does not mean that there are not good options, but there are fewer of them.
In fact, if you need a complete parser generator for a .NET Core project your only option is using ANTLR. Though you have more choices available for parser combinators.
On the other hand, if you need to parse C# you have the chance to use the official compiler very easily, so that is a plus.
We cannot really say what software you should use. What it is best for a user might not be the best for somebody else. And we all know that the most technically correct solution might not be ideal in real life with all its constraints. So we wanted to share what we have learned on the best options for parsing in C#.
Thanks to Lee Humphries for his feedback on this article and Benjamin Hodgson for having signalled to us Pidgin.
Published at DZone with permission of Gabriele Tomassetti, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.
Comments