Adventures in Parsing C: ASTs for Switch Statements
Join the DZone community and get the full member experience.
Join For FreeLast week I received an email from a user of pycparser that mentioned the strange AST that results when pycparser parses a switch statement.
Let’s take the following snippet of C code for example. Don’t look for semantic sense in it – it’s just used to test the parser:
switch (myvar) { case 10: k = 10; p = k + 1; return 10; case 20: case 30: return 20; default: break; }
And the AST pycparser was generating for this code:
Switch: ID: myvar Compound: Case: Constant: int, 10 Assignment: = ID: k Constant: int, 10 Assignment: = ID: p BinaryOp: + ID: k Constant: int, 1 Return: Constant: int, 10 Case: Constant: int, 20 Case: Constant: int, 30 Return: Constant: int, 20 Default: Break:
There are two problems here:
- Only the first statement inside each case is made a child of that case – the other statements are siblings.
- Two consecutive case statements without any other statements in between (fall-through) cause the second case to become the child of the first one. If additional consecutive case statements follow, they nest even further.
Since the parser follows the C grammar pretty closely, I immediately went to look into the C99 standard, and indeed, this is exactly the parse tree that it mandates. Here’s the relevant portion of the language grammar (from section A.2.3):
(6.8) statement: labeled-statement compound-statement expression-statement selection-statement iteration-statement jump-statement (6.8.1) labeled-statement: identifier : statement case constant-expression : statement default : statement
Note that a case (and default, which is equivalent to case in this whole discussion) must be followed by one, and only one other statement. This explains why pycparser parses the code above the way it does.
However, the goal of pycparser is not to generate a parse tree. It is to generate an abstract syntax tree (AST), which follows the language semantics rather than its grammar. Hey, I already wrote about this stuff!
So today I fixed this part of pycparser, by adding a dedicated AST transformation after parsing a switch statement. The transformation isn’t really complicated, and the AST pycparser generates now is much friendlier. Here it is, for the same code:
Switch: ID: myvar Compound: Case: Constant: int, 10 Assignment: = ID: k Constant: int, 10 Assignment: = ID: p BinaryOp: + ID: k Constant: int, 1 Return: Constant: int, 10 Case: Constant: int, 20 Case: Constant: int, 30 Return: Constant: int, 20 Default: Break:
As you can see, the problems mentioned above were fixed. This fix is available in the pycparser Mercurial repository and will be part of the next release.
Published at DZone with permission of Eli Bendersky. See the original article here.
Opinions expressed by DZone contributors are their own.
Trending
-
RBAC With API Gateway and Open Policy Agent (OPA)
-
What ChatGPT Needs Is Context
-
Extending Java APIs: Add Missing Features Without the Hassle
-
Is Podman a Drop-in Replacement for Docker?
Comments