Sam's Blog

18 Oct

ManagedMyC: Intro to building an AST

This is part 3 of [many?] posts about creating an ANTLR-based language service for Visual Studio.

Sure a scanner and parser are cool, and syntax highlighting is nice. But the real power in the Visual Studio language services comes in their IntelliSense abilities, and supporting those effectively requires building and processing an AST. In this section, I’ll show how to integrate the ANTLR automatic AST features, including a tree parser, into the ManagedMyC language service. Among other things, I’ll assume the reader is already familiar with ANTLR syntax for parsers, rewrites, and tree parsers.

There are plenty of other places to learn about those separately; my goal is to show how to start incorporating the existing knowledge into a usable language service. This article only discusses adding the tree grammar to your language service and using it to process a bare-bones tree created in the parser. For this article, there are no new UI/language service features supported. In the next article, I’ll show how to use the tree parser output implement a solid TypeAndMemberDropdownBars implementation.

Building an AST in the parser

At this point, we’ll focus on a minimal AST needed to support a limited number of features. We want to know:

  • The type and name of declared global variables
  • The signature and location (span) of functions defined at global scope

In the grammar MyC.g3, add options indicating we intend to build an AST.

options
{
    language=CSharp2;
    output=AST;
    ASTLabelType=CommonTree;
}

Then build the minimal AST. Pseudo-tokens are used as the root of some trees for both identification and to keep the tree parser LL(1) for speed. Only the parser rules are included here, and I left out most of the tokens to save space.

tokens
{
    // all our previous tokens still go here
    END_BLOCK_COMMENT = '*/';

    AST_FUNCDEF;
    AST_PARAMS;
    AST_DECL;
}

@lexer::namespace { ManagedMyC }
@parser::namespace { ManagedMyC }

program
    :   declarations
        EOF
    ;

declarations
    :   declaration*
    ;

declaration
    :   declaration_
    ;

declaration_
    :   class1? type? IDENTIFIER paren_params block
        { Region( $block.start, $block.stop ); }
        -> ^(AST_FUNCDEF class1? type? IDENTIFIER paren_params block)
    |   simple_declaration
    ;

simple_declarations1
    :   simple_declaration+
    ;

simple_declaration
    :   semi_declaration ';'!
    ;

semi_declaration
    :   class1? type IDENTIFIER
        (',' IDENTIFIER)*
        -> ^(AST_DECL class1? type IDENTIFIER+)
    ;

params1
    :   parameter (','! parameter)*
    ;

parameter
    :   type^ IDENTIFIER
    ;

paren_params
    :   l='(' params1? r=')'
        { Match($l, $r); }
        -> ^(AST_PARAMS params1?)
    ;

class1
    :   'static'
    |   'auto'
    |   'extern'
    ;

type
    :   'int'
    |   'void'
    ;

// for now, we don't need to know the contents of a block, just where it starts and ends.
block
    :   open_block (block_content1!)? close_block
        { Match($open_block.start, $close_block.start); }
    ;

open_block
    :   '{'
    ;

close_block
    :   '}'
    ;

// From here on it's the same as in the first version of the grammar

Adding a tree grammar to the project

We’ll name the tree grammar MyCWalker, so the grammar file is MyCWalker.g3, the generated class is MyCWalker.cs, and the user-created partial class we’ll call MyCWalkerHelper.cs. When I add a new grammar to a project, the first thing I do is create empty files with those names and place them in the project folder. Then I unload the project in Visual Studio, and right click > Edit ManagedMyC.csproj. Add the following items to the project:

1
2
3
4
5
6
7
8
9
10
11
12
13
  <ItemGroup>
    <Antlr3 Include="MyCWalker.g3">
      <OutputFiles>MyCWalker.cs;MyCWalker.tokens</OutputFiles>
    </Antlr3>
    <Compile Include="MyCWalker.cs">
      <AutoGen>True</AutoGen>
      <DesignTime>True</DesignTime>
      <DependentUpon>MyCWalker.g3</DependentUpon>
    </Compile>
    <Compile Include="MyCWalkerHelper.cs">
      <DependentUpon>MyCWalker.g3</DependentUpon>
    </Compile>
  </ItemGroup>

Save the modified project file, close it, and then right click on the project in Solution Explorer > Reload.

Creating a tree grammar to match the parser’s generated AST

tree grammar MyCWalker;

options
{
    language=CSharp2;
    tokenVocab=MyC;
    ASTLabelType=CommonTree;
}

@namespace { ManagedMyC }

program
    :   declarations
    ;

declarations
    :   declaration*
    ;

declaration
    :   declaration_
    ;

declaration_
    :   ^(AST_FUNCDEF class1? type? IDENTIFIER parameters block)
    |   simple_declaration
    ;

simple_declarations1
    :   simple_declaration+
    ;

simple_declaration
    :   ^(AST_DECL class1? type IDENTIFIER+)
    ;

parameters
    :   ^(AST_PARAMS parameter*)
    ;

parameter
    :   ^(type IDENTIFIER)
    ;

class1
    :   'static'
    |   'auto'
    |   'extern'
    ;

type
    :   'int'
    |   'void'
    ;

// notice we excluded the block_content1 tree in the parser, so we skip it here too
block
    :   open_block /*block_content1?*/ close_block
    ;

open_block
    :   '{'
    ;

close_block
    :   '}'
    ;

Calling the tree parser during parse operations

First, we edit the MyCWalkerHelper.cs file to add a WalkAST function that the parser can use.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
using Antlr.Runtime.Tree;
 
namespace ManagedMyC
{
    partial class MyCWalker
    {
        public MyCWalker( MyCParser parser, CommonTree tree )
            : this( new CommonTreeNodeStream( tree ) )
        {
            ( (CommonTreeNodeStream)input ).TokenStream = parser.TokenStream;
        }
 
        public void WalkAST()
        {
            program();
        }
    }
}

The main entry point to the parser is MyCParser.Parse() in MyCParserHelper.cs. We add code to that function to automatically process the tree after a parse:

public bool Parse()
{
    program_return result = program();

    MyCWalker walker = new MyCWalker( this, (CommonTree)result.Tree );
    walker.WalkAST();

    return true;
}

What’s next

I’m not including a new set of source code with this post because the next set of source already includes the TypeAndMemberDropdownBars implementation that I’ll be explaining in the next article.

Leave a Reply

© 2024 Sam's Blog | Entries (RSS) and Comments (RSS)

Your Index Web Directorywordpress logo