Comments on: Custom Visual Studio language services: ManagedMyC meets ANTLR

By: Visual Studio Pre-Build events and cashing

Visual Studio Pre-Build events and cashing — Wed, 19 Sep 2012 13:44:06 +0000

[...] is updated every single time!Hope this can ease some frustration out there.Credit is due to Sam’s Blog for supplying the solution for this problem.*Actually it is two numbers behind because Visual [...]

By: pumR

pumR — Thu, 14 May 2009 14:55:46 +0000

Hi Guys! The real problem for me is to get the language service running on my custom editor not on the core editor. I hope that somebody can help me. How can I set a specific language service to e.g. a textfield? Any links or snippets would help a lot! Thx!

By: James N

James N — Wed, 28 Jan 2009 17:11:43 +0000

Almost. The asterisk keeps getting interpreted as special character by wordpress. That was supposed to be "When you encounter /[asterisk]" and "when you later encounter "[asterisk]/".

By: James N

James N — Wed, 28 Jan 2009 17:06:22 +0000

Urgh, that got mangled. Let me try again.

I use MPLex's start conditions just like the ManagedMyC example does, e.g. you provide a different set of patterns that only apply when you are inside a comment, and prefix them with <COMMENT>. When you encounter /, you put "BEGIN<COMMENT>" in your rule to enter the COMMENT start condition; when you later encounter "/", you use "BEGIN<INITIAL>" to end it.

By: James N

James N — Wed, 28 Jan 2009 17:00:21 +0000

...the ManagedMyC sample from the Visual Studio SDK... The most important thing to note at this point: many parts of this sample are inefficient, clumsy, and/or just done the wrong way.

Hmm... I just made a project implementing a language service for a simple c-like language, using the ManagedMyC sample as a starting point. Could you elaborate on what this sample does incorrectly? (though I've found some bugs in it already).

About the manual coding vs. use of predicates issue with block comments: I use MPLex's start conditions just like the ManagedMyC example does, e.g. you provide a different set of patterns that only apply when you are inside a comment, and prefix them with . When you encounter "/", you put "BEING" in your rule to enter the COMMENT start condition; when you later encounter "/", you use "BEING" to end it. I don't know if ANTLR has anything like start conditions, but they were made to handle cases like this. I also use them to handle string literals and preprocessor statements, and they work like a charm.

By: 280Z28

280Z28 — Tue, 06 Jan 2009 19:21:32 +0000

Hi Mike,

I've done 3 different things for 3 different languages. Each one was successful (good performance) for source files of 20000+ lines / 500+ kb.

UnrealScript:

I'm not compiling UnrealScript, so the grammar is solely used for IntelliSense purposes. The lexer rules in this grammar support the method described in this post, and the colorizer is implemented as a larger version of what's in this post.

StringTemplate:

I updated the lexer rules in Group.g3 to meet the colorizer requirements described in this post. I don't like this as much because the implementation of the StringTemplate library, which is completely independent of the language service, must now meet special requirements so the language service works. This type of dependency is unacceptable, so I'll be changing over to the method I use for the ANTLR v3 Grammar language service.

ANTLR v3:

I reference C# port of the ANTLR tool to gather IntelliSense information / full source parsing. To implement the colorizer, I copied all of the lexer rules from ANTLR.g3 and made a new AntlrColorizerLexer.g3 inside the language service. I then updated this lexer to support the colorizer. If the ANTLR lexer spec changes in the future, I will have to update this lexer in the language service to reflect the changes, but I believe this is an acceptable situation.

Finally, regarding the use of manual coding instead of predicates: predicates of this form greatly impact the performance of the lexer. The method for implementing a colorizer as described here offers good performance and provides easy access to the original token information from the lexer at any point in the code via the TokenInfo. The StartIndex and EndIndex give the location, and the Token member (int) holds the lexer token type.

By: Mike Pagel

Mike Pagel — Mon, 05 Jan 2009 20:17:19 +0000

Hi Sam,

thanks for this post. I do have a few questions, though.

It seems you have introduced "multi-line tokens" and your way to approach them more or less in order to stick with the Babel frameworks approach to use only lexer tokens for colorization. But I believe this is not optimal:

(1) Lexer tokens like "Identifier" will appear in different scenarios (parser rules) where they are e.g. class names or object names, which are already colored differently in VisualStudio, so there they cannot result from the same token.

You therefore must solve this by adding a statemachine to the lexer, which essentially turns the regular lexer language into a context-free language. Since this is typically not supported by lexer generators you have to handcode the statemachine as done in your sample through the introduction of the state variable InBlockComment. For comments this is acceptable, but what about the class vs. object name example? You essentially would have to build parts of the AST to understand in the lexer (!) what kind of Identifier you are just scanning. That will be a lot of effort, won't it?

(2) Then you add the switch/case- ("if-" in your sample) statement doing the evaulation of the current lexer state into the handwritten NextToken method. Now that is quite hard to maintain as the statemachine is now split into two parts: modifying actions in the lexer grammar and guards and transition detection in NextToken(). I am wondering whether ANTLR's semantic predicates would do better here.

Is the approach shown in this post really scalable to support real languages of some size (whatever that means...)?

I'd be happy to hear about your thoughts. Thanks a lot, Mike

By: Sam’s Blog » Blog Archive » ManagedMyC: Type and member dropdown bars

Sun, 19 Oct 2008 22:34:13 +0000

[...] Here’s the source code for the ManagedMyC sample at this point. Since I surely missed things, you can always diff this code versus the original source from my first post on this subject. [...]