tree-sitter-compiler

Incremental parsers for node

Usage no npm install needed!

<script type="module">
  import treeSitterCompiler from 'https://cdn.skypack.dev/tree-sitter-compiler';
</script>

README

tree-sitter compiler

Incremental parsers for node

Build Status

Installation

npm install tree-sitter-compiler

Creating a language

Create a grammar.js in the root directory of your module. This file should create and export a grammar object using tree-sitter's helper functions:

module.exports = grammar({
  name: "arithmetic",

  extras: $ => [$.comment, /\s/],

  rules: {
    program: $ => repeat(choice(
      $.assignment_statement,
      $.expression_statement
    )),

    assignment_statement: $ => seq(
      $.variable, "=", $.expression, ";"
    ),

    expression_statement: $ => seq(
      $.expression, ";"
    ),

    expression: $ => choice(
      $.variable,
      $.number,
      prec(1, seq($.expression, "+", $.expression)),
      prec(1, seq($.expression, "-", $.expression)),
      prec(2, seq($.expression, "*", $.expression)),
      prec(2, seq($.expression, "/", $.expression)),
      prec(3, seq($.expression, "^", $.expression))
    ),

    variable: $ => (/\a\w+/)

    number: $ => (/\d+/)

    comment: $ => (/#.*/)
  }
});

Run tree-sitter compile. This will generate a C function for parsing your language, a C++ function that exposes the parser to javascript, and a binding.gyp file for compiling these sources into a native node module.

Grammar syntax

The grammar function takes an object with the following keys:

  • name - the name of the language.
  • rules - an object whose keys are rule names and whose values are Grammar Rules. The first key in the map will be the start symbol. See the 'Rules' section for how to construct grammar rules.
  • extras - an array of Grammar Rules which may appear anywhere in a document. This construct is used to useful for things like whitespace and comments in programming languages.
  • conflicts - an array of arrays of Grammar Rules which are known to conflict with each other in an LR(1) parser. You'll need to use this if writing a grammar that is not LR(1).

Rules

  • $.property references - match another rule with the given name.
  • String literals - match exact strings.
  • RegExp literals - match strings according to ECMAScript regexp syntax. Assertions (e.g. ^, $) are not yet supported.
  • choice(rule...) - matches any one of the given rules.
  • repeat(rule) - matches any number of repetitions of the given rule.
  • seq(rule...) - matches each of the given rules in sequence.
  • blank() - matches the empty string.
  • optional(rule) - matches the given rule or the empty string.