Over a million developers have joined DZone.

Lap Around Roslyn CTP: Syntax Rewriting with Symbol Information

·

Last time around, we were replacing the 42 numeric literal with 43. This time let’s pretend to do something more useful. Suppose you really don’t like developers calling the Console.Write method and insist on using Console.WriteLine instead. You might be slightly reluctant to use find-and-replace, because—just like last time—you don’t want to modify Console.Write calls within comments, within string literals, or—and this is vicious—calls to the Console.Write method on something that is not the System.Console class from the mscorlib assembly, like maybe a property called Console!

The C# parser, which we met in its SyntaxTree incarnation, doesn’t bind MethodInvocationExpression instances to the actual method being invoked. All it cares about is the proper structure of the expression. For all it care, Console could be a private class, and not the BCL one.

Enter the semantic model (SemanticModel class), which represents everything the compiler knows about your code after binding the syntax tree to symbols. In this case, the semantic model will give us a symbol for the Console.Write invocation expression, and we’ll be able to tell which Console.Write is being invoked and replace it accordingly.

To obtain a SemanticModel instance, we need to provide Roslyn with all the information to perform binding—i.e., the assembly references for our code. Recall that we could create a SyntaxTree without specifying those! This information is wrapped by a Compilation class instance, which can be used (eventually) to emit actual code.

Compilation compilation = Compilation.Create(
    "MyCompilation",
    CompilationOptions.Default,
    new SyntaxTree[] { tree },
    new MetadataReference[] {
        new AssemblyFileReference(
            typeof(object).Assembly.Location)
    },
    null, null);
SemanticModel model = compilation.GetSemanticModel(tree);

Now that we have the semantic model, we can use pass it to our rewriter’s constructor:

/// <summary>
/// Replaces Console.Write calls with equivalent
/// Console.WriteLine calls.
/// </summary>
class MyConsoleWriteRewriter : SyntaxRewriter
{
    private readonly SemanticModel _semanticModel;

    public MyConsoleWriteRewriter(SemanticModel model)
    {
        _semanticModel = model;
    }

    protected override SyntaxNode
        VisitInvocationExpression(
            InvocationExpressionSyntax node)
    {
        SemanticInfo info =
            _semanticModel.GetSemanticInfo(node);
        MethodSymbol symbol = (MethodSymbol)info.Symbol;
        if (symbol.Name == "Write" &&
            symbol.ContainingType.Name == "Console" &&
            symbol.ContainingNamespace.Name == "System" &&
            symbol.ContainingAssembly.Name == "mscorlib")
        {
            MemberAccessExpressionSyntax old =
                (MemberAccessExpressionSyntax)
                node.Expression;
            return node.ReplaceNode(
                old,
                old.Update(
                    old.Expression,
                    old.OperatorToken,
                    Syntax.IdentifierName("WriteLine")));
        }            
        return node;
    }
}

To understand what’s going on here, let’s take a look at the structure of the MethodInvocationExpression node for a typical Console.Write call:

image

The MethodInvocationExpression, in this case, consists of a MemberAccessExpression, which specifies the method to invoke, and an ArgumentList that specifies the arguments. Because we trust Console.WriteLine to accept the same arguments Console.Write accepts, we don’t need to touch the ArgumentList node. Moreover, we don’t even need to touch the first IdentifierName under the MemberAccessExpression—all we need to replace is the second IdentifierName.

Therefore, we return a new node from our VisitInvocationExpression method whenever we have something to replace the existing node with. Specifically, we ask the semantic model to give us symbol information for the method invocation expression—if it matches the System.Console.Write method from the mscorlib assembly, we keep the entire expression except the method name identifier.

Of course, to apply this rewriter to our tree, we need to provide to it the SemanticModel instance retrieved earlier:

SyntaxNode newRoot =
    new MyConsoleWriteRewriter(model).Visit(tree.Root);

This actually starts looking useful. Next time, we won’t bother with rewriting, but instead perform more complicated analysis of the syntax tree and semantic model, including data flow and control flow within a method.


I have been recently posting short updates and links on Twitter as well as on this blog. You can follow me: @goldshtn
Topics:

Published at DZone with permission of Sasha Goldshtein, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

SEE AN EXAMPLE
Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.
Subscribe

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}