Andrew Dupont c674d8ff9b Update README for `web-tree-sitter` 1 月之前
..
README.md c674d8ff9b Update README for `web-tree-sitter` 1 月之前
tree-sitter.js 4d696a7438 Rewrite warning message in `web-tree-sitter` 1 月之前
tree-sitter.wasm 87a9322415 Upgrade `web-tree-sitter` to 0.23.0… 1 月之前

README.md

Building a custom web-tree-sitter

Tree-sitter parsers often use external C scanners, and those scanners sometimes use functions in the C standard library. For this to work in a WASM environment, web-tree-sitter needs to have anticipated which stdlib functions will need to be available. If a Tree-sitter parser uses stdlib function X, but X is not included in this list of symbols, the parser will fail to work and will throw an error whenever it hits a code path that uses the rogue function.

For this reason, Pulsar builds a custom web-tree-sitter. Every time someone tries to integrate a new tree-sitter parser into a Pulsar grammar, they might find that the parser relies on some stdlib function we haven’t included yet — in which case they can let us know and we’ll be able to update our web-tree-sitter build so that it can export that function.

The need to do this will decrease over time as C++ scanners are deprecated and as parsers are increasingly encouraged to use a fixed subset of possible stdlib exports, but it’s still necessary right now.

We also take advantage of the custom build by adding a check for a common failure scenario — a parser trying to use a stdlib function that hasn’t been exported — so that we can log a helpful error message to the console when it happens.

Check out the modified branch for the version we’re targeting

At time of writing, Pulsar was targeting web-tree-sitter version 0.23.0, so a branch exists on our fork called v0-23-0-modified. That branch contains a modified stdlib-symbols.txt file and a modified script for building web-tree-sitter.

When we target a newer version of web-tree-sitter, a similar branch should be created against the corresponding upstream tag. The commits that were applied on the previous modified branch should be able to be cherry-picked onto the new one rather easily.

Add whatever methods are needed to stdlib-symbols.txt

For instance, one of the parsers we use depends on the C stdlib function isalnum, and web-tree-sitter doesn’t export that one by default. So we can add the line

  "isalnum",

in an appropriate place in stdlib-symbols.txt, then rebuild web-tree-sitter so that the WASM-built version of that parser has that function available to it.

If a third-party tree-sitter grammar needs something more esoteric, we should encourage them to follow current best practices for Tree-sitter parsers. But we may still want to add that dependency to the build.

Run script/build-wasm from the root

To build web-tree-sitter for a particular version, make sure you’re using the appropriate version of Emscripten. This document is useful at matching up tree-sitter versions with Emscripten versions.

The default build-wasm script now skips minification, so we no longer have to un-minify the JavaScript output.

Add a warning message

When a parser tries to use a stdlib function that isn’t exported by web-tree-sitter, the error that’s thrown is not very useful. So we try to detect when that scenario is going to happen and insert a warning in the console to help users that might otherwise be befuddled.

This may be automated in the future, but for now you can modify tree-sitter.js so that this function…

function resolveSymbol(sym) {
  var resolved = resolveGlobalSymbol(sym).sym;
  if (!resolved && localScope) {
    resolved = localScope[sym];
  }
  if (!resolved) {
    resolved = moduleExports[sym];
  }
  return resolved;
}

…has an extra check at the end:

function resolveSymbol(sym) {
  var resolved = resolveGlobalSymbol(sym).sym;
  if (!resolved && localScope) {
    resolved = localScope[sym];
  }
  if (!resolved) {
    resolved = moduleExports[sym];
  }
  if (!resolved) {
    console.warn(`Warning: parser wants to call function ${sym}, but it is not defined. If parsing fails, this is probably the reason why. Please report this to the Pulsar team so that this parser can be supported properly.`);
  }
  return resolved;
}

The function in question is generated by emscripten and is the rough equivalent of what we’d get if we built with assertions enabled (though less generic and more tailored to Pulsar). If the implementation changes on the emscripten side, you should still be able to find the equivalent logic.

Copy it to vendor

Under lib/binding_web you’ll find the built files tree-sitter.js and tree-sitter.wasm. Copy both to Pulsar’s vendor/tree-sitter directory. Relaunch Pulsar and do a smoke test with a couple of existing grammars to make sure you didn’t break anything.

Commit it

Be sure to mention the version you’re upgrading to in the commit message so grammar authors have some way of discerning the version of web-tree-sitter they should target.

(It’s a stretch goal to include this information in a more structured format so that it can be inspected at runtime.)