Remix.run Logo
rs545837 12 hours ago

Great point. C/C++ with macros and preprocessor directives is where tree-sitter's error recovery gets stretched. We support both C and C++ in sem-core(https://github.com/Ataraxy-Labs/sem) but the entity extraction is best-effort for heavily macro'd code. For most application-level C++ it works well, but something like the Linux kernel would be rough. Honestly that's an argument for gritzko's AST-native storage approach where the parser can be more tightly integrated.

pfdietz 7 hours ago | parent [-]

It's an argument against preprocessors for programming languages.

Tree-sitter's error handling is constrained by its intended use in editors, so incrementality and efficiency are important. For diffing/merging, a more elaborate parsing algorithm might be better, for example one that uses an Earley/CYK-like algorithm but attempts to minimize some error term (which a dynamic programming algorithm could be naturally extended to.)

rs545837 2 hours ago | parent [-]

Interesting idea. Tree-sitter's trade-off (speed + incrementality over completeness) makes sense for editors but you're right that for merge/diff a more thorough parser could be worth the cost since it's a cold path, not real-time. We only parse three file versions at merge time so spending an extra 50ms on a better parse would be fine. Worth exploring, thanks for the pointer.