Andrew Dupont c674d8ff9b Update README for `web-tree-sitter` | 1 月之前 | |
---|---|---|
.. | ||
README.md | 1 月之前 | |
tree-sitter.js | 1 月之前 | |
tree-sitter.wasm | 1 月之前 |
Tree-sitter parsers often use external C scanners, and those scanners sometimes use functions in the C standard library. For this to work in a WASM environment, web-tree-sitter
needs to have anticipated which stdlib functions will need to be available. If a Tree-sitter parser uses stdlib function X, but X is not included in this list of symbols, the parser will fail to work and will throw an error whenever it hits a code path that uses the rogue function.
For this reason, Pulsar builds a custom web-tree-sitter
. Every time someone tries to integrate a new tree-sitter parser into a Pulsar grammar, they might find that the parser relies on some stdlib function we haven’t included yet — in which case they can let us know and we’ll be able to update our web-tree-sitter
build so that it can export that function.
The need to do this will decrease over time as C++ scanners are deprecated and as parsers are increasingly encouraged to use a fixed subset of possible stdlib exports, but it’s still necessary right now.
We also take advantage of the custom build by adding a check for a common failure scenario — a parser trying to use a stdlib function that hasn’t been exported — so that we can log a helpful error message to the console when it happens.
At time of writing, Pulsar was targeting web-tree-sitter
version 0.23.0, so a branch exists on our fork called v0-23-0-modified
. That branch contains a modified stdlib-symbols.txt
file and a modified script for building web-tree-sitter
.
When we target a newer version of web-tree-sitter
, a similar branch should be created against the corresponding upstream tag. The commits that were applied on the previous modified branch should be able to be cherry-picked onto the new one rather easily.
stdlib-symbols.txt
For instance, one of the parsers we use depends on the C stdlib function isalnum
, and web-tree-sitter
doesn’t export that one by default. So we can add the line
"isalnum",
in an appropriate place in stdlib-symbols.txt
, then rebuild web-tree-sitter
so that the WASM-built version of that parser has that function available to it.
If a third-party tree-sitter grammar needs something more esoteric, we should encourage them to follow current best practices for Tree-sitter parsers. But we may still want to add that dependency to the build.
script/build-wasm
from the rootTo build web-tree-sitter
for a particular version, make sure you’re using the appropriate version of Emscripten. This document is useful at matching up tree-sitter versions with Emscripten versions.
The default build-wasm
script now skips minification, so we no longer have to un-minify the JavaScript output.
When a parser tries to use a stdlib function that isn’t exported by web-tree-sitter
, the error that’s thrown is not very useful. So we try to detect when that scenario is going to happen and insert a warning in the console to help users that might otherwise be befuddled.
This may be automated in the future, but for now you can modify tree-sitter.js
so that this function…
function resolveSymbol(sym) {
var resolved = resolveGlobalSymbol(sym).sym;
if (!resolved && localScope) {
resolved = localScope[sym];
}
if (!resolved) {
resolved = moduleExports[sym];
}
return resolved;
}
…has an extra check at the end:
function resolveSymbol(sym) {
var resolved = resolveGlobalSymbol(sym).sym;
if (!resolved && localScope) {
resolved = localScope[sym];
}
if (!resolved) {
resolved = moduleExports[sym];
}
if (!resolved) {
console.warn(`Warning: parser wants to call function ${sym}, but it is not defined. If parsing fails, this is probably the reason why. Please report this to the Pulsar team so that this parser can be supported properly.`);
}
return resolved;
}
The function in question is generated by emscripten and is the rough equivalent of what we’d get if we built with assertions enabled (though less generic and more tailored to Pulsar). If the implementation changes on the emscripten side, you should still be able to find the equivalent logic.
vendor
Under lib/binding_web
you’ll find the built files tree-sitter.js
and tree-sitter.wasm
. Copy both to Pulsar’s vendor/tree-sitter
directory. Relaunch Pulsar and do a smoke test with a couple of existing grammars to make sure you didn’t break anything.
Be sure to mention the version you’re upgrading to in the commit message so grammar authors have some way of discerning the version of web-tree-sitter
they should target.
(It’s a stretch goal to include this information in a more structured format so that it can be inspected at runtime.)