Skip to content

Syntax Rules

Matthew Daly edited this page Jul 23, 2022 · 4 revisions

Syntax rules are at the core of maple's syntax highlighter, although they are used by a few other features as well. Rules are defined "naively" using Regular Expressions. While this limits the amount of detail possible in some syntax rules, it makes it trivial to add support for new languages.

Writing syntax rules

The list of syntax categories is consistently expanding with each release of maple, though this will slow over time as it becomes more "complete". As of right now, it is not possible to define any custom categories, though syntax files can certainly omit unimportant categories.

It is also important to note that there are no restrictions on the RegEx inside syntax categories. This means that the quality and performance of the lexer can vary between languages, since there is no guarantee that syntax files are accurate in all edge cases nor that they use the most efficient RegEx possible. That said, any syntax file included with maple has been tested in real-world use and is considered reasonably accurate and performant.

NOTE: The order in which syntax rules are defined determines their order of priority when parsed. For example, in cs.xml BooleanLiteral is defined before Alphabetical so that "true" and "false" are matched as Booleans, not standard alphabetical tokens.

Syntax rules are defined in [extension].xml files and are loaded automatically according to the file extension. For example, *.cs files will always load cs.xml. A simple example of a syntax rule is as follows:

<Syntax type="HexLiteral" insensitive="true">0x[0-9a-f]+</Syntax>

The category of syntax is set by the type property, and each syntax rule must have a valid property. In this case, the RegEx has been marked case insensitive with insensitive, though this is not true for all rules. The RegEx is then contained within the XML element.

NOTE: Certain characters, like < and > are not allowed within XML elements and must be replaced by ampersand codes. So, performing a look-behind in RegEx would look like this: ?&lt;=a rather than ?<=a. The most common codes for writing RegEx are &lt;, &gt;, and &amp;.

Keywords

Keywords are also supported within syntax files. Keywords are defined as simple strings and the Lexer will turn Alphabetical tokens into Keyword tokens if the text matches. A block of keywords is written like so:

<Keywords>
    <Keyword>public</Keyword>
    <Keyword>static</Keyword>
    ...
</Keywords>

NOTE: Keyword is a token category just like any other and can be defined with RegEx as well.

Other properties

Syntax files can include additional information used by other maple features. If a property also exists in properties.xml, it will be overridden if defined in a syntax file.

Property Description
DefaultEncoding The encoding to use when saving this type of file, either utf8 or ascii.
CommentPrefix For example, // in C#. Used by the comment command.
AutocompletePairings A string of 2-character autocomplete pairings. For example, (){} means that ( will autocomplete with ) and { with }.

List of syntax categories

  • NumberLiteral
  • StringLiteral
  • CharacterLiteral
  • HexLiteral
  • BooleanLiteral
  • Alphabetical
  • Break
  • Grouping
  • Comment
  • Operator
  • Url
  • Function
  • Keyword
  • SpecialChar

Clone this wiki locally