Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Adventures in Parsing C: ASTs for Switch Statements

DZone's Guide to

Adventures in Parsing C: ASTs for Switch Statements

· Web Dev Zone
Free Resource

Learn how to build modern digital experience apps with Crafter CMS. Download this eBook now. Brought to you in partnership with Crafter Software

Last week I received an email from a user of pycparser that mentioned the strange AST that results when pycparser parses a switch statement.

Let’s take the following snippet of C code for example. Don’t look for semantic sense in it – it’s just used to test the parser:

switch (myvar) {
    case 10:
        k = 10;
        p = k + 1;
        return 10;
    case 20:
    case 30:
        return 20;
    default:
        break;
}

And the AST pycparser was generating for this code:

Switch:
  ID: myvar
  Compound:
    Case:
      Constant: int, 10
      Assignment: =
        ID: k
        Constant: int, 10
    Assignment: =
      ID: p
      BinaryOp: +
        ID: k
        Constant: int, 1
    Return:
      Constant: int, 10
    Case:
      Constant: int, 20
      Case:
        Constant: int, 30
        Return:
          Constant: int, 20
    Default:
      Break:

There are two problems here:

  1. Only the first statement inside each case is made a child of that case – the other statements are siblings.
  2. Two consecutive case statements without any other statements in between (fall-through) cause the second case to become the child of the first one. If additional consecutive case statements follow, they nest even further.

Since the parser follows the C grammar pretty closely, I immediately went to look into the C99 standard, and indeed, this is exactly the parse tree that it mandates. Here’s the relevant portion of the language grammar (from section A.2.3):

(6.8) statement:
              labeled-statement
              compound-statement
              expression-statement
              selection-statement
              iteration-statement
              jump-statement
(6.8.1) labeled-statement:
              identifier : statement
              case constant-expression : statement
              default : statement

Note that a case (and default, which is equivalent to case in this whole discussion) must be followed by one, and only one other statement. This explains why pycparser parses the code above the way it does.

However, the goal of pycparser is not to generate a parse tree. It is to generate an abstract syntax tree (AST), which follows the language semantics rather than its grammar. Hey, I already wrote about this stuff!

So today I fixed this part of pycparser, by adding a dedicated AST transformation after parsing a switch statement. The transformation isn’t really complicated, and the AST pycparser generates now is much friendlier. Here it is, for the same code:

Switch:
  ID: myvar
  Compound:
    Case:
      Constant: int, 10
      Assignment: =
        ID: k
        Constant: int, 10
      Assignment: =
        ID: p
        BinaryOp: +
          ID: k
          Constant: int, 1
      Return:
        Constant: int, 10
    Case:
      Constant: int, 20
    Case:
      Constant: int, 30
      Return:
        Constant: int, 20
    Default:
      Break:

As you can see, the problems mentioned above were fixed. This fix is available in the pycparser Mercurial repository and will be part of the next release.

 

Crafter is a modern CMS platform for building modern websites and content-rich digital experiences. Download this eBook now. Brought to you in partnership with Crafter Software.

Topics:

Published at DZone with permission of Eli Bendersky, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}