Remix.run Logo
FjordWarden 4 hours ago

This is like the difference between an orange and fruit juice. You can squeeze an orange to extract its juices, but that is not the only thing you can do with it, nor is it the only way to make fruit juice.

I use tree-sitter for developing a custom programming language, you still need an extra step to get from CST to AST, but the overall DevEx is much quicker that hand-rolling the parser.

danielvaughn 4 hours ago | parent | next [-]

Every time I get to sing Treesitters praise, I take the opportunity to. I love it so much. I've tried a bunch of parser generators, and the TS approach is so simple and so good that I'll probably never use anything else. The iteration speed lets me get into a zen-like state where I just think about syntax design, and I don't sweat the technical bits.

lioeters 3 hours ago | parent | prev | next [-]

> extra step to get from CST to AST

Could you elaborate on what this involves? I'm also looking at using tree-sitter as a parser for a new language, possibly to support multiple syntaxes. I'm thinking of converting its parse trees to a common schema, that's the target language.

I guess I don't quite get the difference between a concrete and abstract syntax tree. Is it just that the former includes information that's irrelevant to the semantics of the language, like whitespace?

WilcoKruijer an hour ago | parent | next [-]

In this context you could say that CST -> AST is a normalization process. A CST might contain whitespace and comments, an AST almost certainly won't.

An example: in a CST `1 + 0x1 ` might be represented differently than `1 + 1`, but they could be equivalent in the AST. The same could be true for syntax sugar: `let [x,y] = arr;` and `let x = arr[0]; let y = arr[1];` could be the same after AST normalization.

You can see why having just the AST might not be enough for syntax highlighting.

As a side project I've been working on a simple programming language, where I use tree-sitter for the CST, but first normalize it to an AST before I do semantic analysis such as verifying references.

FjordWarden 3 hours ago | parent | prev | next [-]

TS returns a tree with nodes, you walk the nodes with a visitor pattern. I've experimented with using tree-sitter queries for this, but for now not found this to be easier. Every syntax will have its own CST but it can target a general AST if you will. At the end they can both be represented as s-expressions and but you need rules to go from one flavour of syntax tree to the other.

AST is just CST minus range info and simplified/generalised lexical info (in most cases).

direwolf20 3 hours ago | parent | prev [-]

That's correct.

mattnewport 3 hours ago | parent | prev | next [-]

Yeah, you can even use tree-sitter to implement a language server, I've done this for a custom scripting language we use at work.

lowbloodsugar 3 hours ago | parent | prev [-]

N00b question: Language parsers gives me concrete information, like “com.foo.bar.Baz is defined here”. Does tree sitter do that or does it say “this file has a symbol declaration for Baz” and elsewhere for that file “there is a package statement for ‘com.foo.bar’” and then I have to figure that out?

FjordWarden 3 hours ago | parent [-]

You have to figure this out for yourself in most cases. Tree sitter does have a query language based on s-expressions, but it is more for questions like "give me all the nodes that are literals", and then you can, for example, render those with in single draw call. Tree sitter has incremental parsing, and queries can be fixed at a certain byte range.