Remix.run Logo
Against Query Based Compilers(matklad.github.io)
56 points by surprisetalk 2 days ago | 32 comments
Panzerschrek an hour ago | parent | next [-]

I have a language server implementation for my programming language. Since it has been developed much later than its compiler, no stuff like query-based redesign was viable to implement. So, I just adopted the compiler codebase for language server needs in a tricky way: on each document change I just redo parsing, but not other compilation steps and use parsing results to perform fetches from name tables built previously - for some older document state. This state is costly to update and it's done only once in a while and only if input document is syntactically-correct. This approach works surprisingly well (it's fast and completion is mostly correct) and doesn't require throwing the whole compiler code away.

Even more, I can't imagine how people write language servers with full reprocessing on each document change (even with query-based approach). How do they deal with broken syntax? isn't it generally impossible to recovery syntactic errors, so that the end result is as good as expected?

armchairhacker a minute ago | parent [-]

I know Haskell’s LSP years ago just stopped on broken syntax, so the only diagnostic would be the syntax error and you would lose code actions. That was annoying, especially because the LSP was slow.

JetBrains (not an LSP) works up to the syntax error, then recovers and works after. In some cases that are not hard to trigger (e.g. after a closing delimiter), the recovery is wrong and everything after is messed up. But in practice that has rarely been an issue; if I want those later diagnostics and code actions, I fix the delimiters, and the analysis updates quickly.

recursivecaveat 5 hours ago | parent | prev | next [-]

Mass unqualified imports is a feature that I really dislike in languages. At least please make it less syntactically convenient so people are not drawn to it. Trawling around to find out where symbols come from has wasted a lot more of my time than unqualified imports have ever saved. Using a very short alias gets you 90% of the way there anyways.

boxed 5 minutes ago | parent | next [-]

In Swift it's even worse. All symbols in neighbor files are available. No imports.

AlotOfReading an hour ago | parent | prev | next [-]

I recently designed a build system and of the key decisions was banning unqualified target names. The neat side effect was that it also became possible to ask the language itself where something was defined with only single additional line of code in the runtime.

p1necone 4 hours ago | parent | prev | next [-]

My plan for this in my current toy language project is to allow things like 'import * from Foo', but save a package.lock esque file somewhere on first build - after that you need to run some kind of '--update' command to bring in totally new names.

The problem I'm trying to solve is more around ensuring that purely additive changes in libraries aren't technically breaking due to the risk of name clashes than general discoverability though.

jiggawatts 5 hours ago | parent | prev [-]

> Trawling around to find out where symbols come from has wasted a lot more of my time than unqualified imports have ever saved.

Why aren't you using an IDE with "navigate to definition" conveniently bound to something like middle-click or ctrl-click?

I haven't used a language/IDE combination without this feature in decades.

alpinisme 4 hours ago | parent | next [-]

Sometimes I just want to open a project quickly and take a look without getting all dependencies installed and the lsp working and so on. Sometimes I want to look at another team’s project even though it’s in another language than my team works in, but I don’t want to set up that language in my environment.

This is particularly true with cpp projects, where wrestling with cmake and co just isn’t how i want to spend my time to answer a question about why something is slow or what a serialized object shape should be under various circumstances.

jiggawatts 2 hours ago | parent [-]

> Sometimes I just want to open a project quickly

Every time I've heard someone say a version of this, invariably they've spent more time doing things manually than the properly mechanised method would have achieved.

There's a seductive immediacy to certain quick & dirty manual processes that in the sum are slower if measured with a stopwatch. It could be as trivial as not having syntax highlighting means that you miss an issue because you didn't notice that something was commented out or a quoted string was quoted incorrectly.

Similarly, I've argued with many developers a few decades back that insisted that the fancy keyboard bindings of EMACS or Vim made them faster, but if raced against someone with an IDE they lost badly every time, no exceptions.

hombre_fatal 2 hours ago | parent [-]

Huh? Their example could be just reading code in github or reading diffs. You shouldn’t need to pull code into a development environment just so you can GoToDefinition to understand what’s going on.

There’s all sorts of workflows where vim would mog the IDE workflow you’re really excited about, like pressing E in lazy git to make a quick tweak to a diff. Or ctrl-G in claude code.

I wouldn’t be so sure you’ve cracked the code on the best workflow that has no negative trade offs. Everyone thinks that about their workflow until they use it long enough to see where it snags.

jiggawatts an hour ago | parent [-]

> You shouldn’t need to

... but you do more often that the quick & dirty approach really allows.

I was just watching the Veritasium episode on the XZ tools hack, which was in part caused by poor tooling.

The attacker purposefully obfuscated his change, making a bunch of "non-changes" such as rearranging whitespace and comments to hide the fact that he didn't actually change the C code to "fix" the bug in the binary blob that contained the malware payload.

You will miss things like this without the proper tooling.

I use IDEs in a large part because they have dramatically better diff tools than CLI tools or even GitHub.

> you’ve cracked the code on the best workflow

I would argue that the ideal tooling doesn't even exist yet, which is why I don't believe that I've got the best possible setup nailed. Not yet.

My main argument is this:

Between each keypress in a "fancy text editor" of any flavour, an ordinary CPU could have processed something like 10 billion instructions. If you spend even a minute staring at the screen, you're "wasting" trillions of possible things the computer could be doing to help you.

Throw a GPU into the mix and the waste becomes absurd.

There's an awful lot the computer could be doing to help developers avoid mistakes, make their code more secure, analyse the consequences of each tiny change, etc...

It's very hard to explain without writing something the length of War & Peace, so let me leave you with a real world example of what I mean from a related field:

There's two kinds of firewall GUIs.

One kind shows you the real-time "hit rate" of each rule, showing packets and bytes matched, or whatever.

The other kind doesn't.

One kind dramatically reduces "oops" errors.

The other kind doesn't. It's the most common type however, because it's much easier to develop as a product. It's the lazy thing. It's the product broken down into independent teams doing their own thing: the "config team" doing their thing and the "metrics" team doing theirs, no overlap. It's Conway's law.

IDEs shouldn't be fancy text editors. They should be constantly analysing the code to death, with AIs, proof assistants, virtual machines, instrumentation, whatever. Bits and pieces of this exist now, scattered, incomplete, and requiring manual setup.

One day we'll have these seamlessly integrated into a cohesive whole, and you'd be nuts to use anything else.

... one day.

akst an hour ago | parent | prev | next [-]

The fact you’re talking someone with this frustration shows maybe there are people with use cases other than yours?

When IDEs do resolve this it tends to be because they built some index to look up these identifiers, which is likely taking up a portion of your memory. A language that statically tells you with an identifier comes from will take out less resources, and your IDE can collapse the import anyways.

So not sure why you feel so strongly about a language design whose ambiguity necessitates consuming additional resources to show you your little drop-down menu.

jitl 4 hours ago | parent | prev [-]

the difference is "i see this information at a glance" versus "i need to move the cursor to each unknown name to ask my ide a question". i think doing the command-click or even just hovering a bunch of unknown symbols is "trawling around". this goes doubly when doing code reviews; i'd prefer code make sense to my grug brain when i view it outside an ide. (sure, i can pull the code down to do an in-depth review in my ide, but the tax for unqualified names is starting to pile up...)

jiggawatts 2 hours ago | parent | next [-]

> the difference is "i see this information at a glance" versus "i need to move the cursor to each unknown name to ask my ide a question".

Is it really "at a glance"?

In most languages either by convention or requirement the "import" or "using" statements are collected at the beginning of a file. Once you've scrolled down even a few lines, the context is gone.

Also, determining what exactly is bound where is decidedly non-trivial in many languages due to keywords such as "var" and "let", overloaded function/method definitions, etc...

Sure, a human can do this with 95% or better accuracy, but that 5% can be a killer during a complex troubleshooting session if you guess wrong.

That's why I strongly prefer IDEs and having a purely mechanical process to resolve the dependencies so I can know exactly what things are instead of hoping my intuitions were correct.

jcgrillo 3 hours ago | parent | prev [-]

similarly, I can step through a program in the debugger and see how it works--or at least that it works--but the whole point of having a programming language in the first place is that (if I use it right) I can understand the program without running it

armchairhacker 2 days ago | parent | prev | next [-]

> In Zig, every file can be parsed completely in isolation...every name needs to be explicitly declared (there’s no use *)...

> In contrast, you can’t really parse a file in Rust...Expanding macros requires name resolution, which, in Rust, is a crate-wide, rather than a file-wide operation...Similarly, the nature of the trait system is such that impl blocks relevant to a particular method call can be found almost anywhere...

matklad doesn't even mention dynamic languages, where perfect name resolution is undecidable. "Fast static analysis, easy static analysis, language designed for static analysis - pick two".

Rust's IDE integration is fast and deep, and I've heard TypeScript's is too, so "easy static analysis" may not be important today. I believe it will as coding evolves due to LLMs, albeit without evidence and I'm not quite sure how.

rendaw 4 hours ago | parent | next [-]

> "Fast static analysis, easy static analysis, language designed for static analysis - pick two"

Where does this come from and what's the explanation?

armchairhacker 28 minutes ago | parent [-]

I made it up, although maybe I’m not the first? The third should actually be “language not designed around static analysis”

- You can write a static analysis that is fast and straightforward if you design the language around it (fast + easy + !language)

- Otherwise, you’ll run into features that prevent that. Ex (stolen from an above comment): name resolution within unqualified imports

- For many such features, you can write a brute-force solution. Ex: scan all possible definitions within the unqualified import until you find the target name (!fast + easy + language)

- Or you can optimize this solution, using incremental cached lookup tables (fast + !easy + language)

Of course, there are languages with features that make most static analyses for them neither fast nor easy (ex: for name resolution, eval or macros). But it holds in many cases, ex:

- Parsing: basic syntax (!language), GLR parser or LL parser with big alternatives (!fast), IELR parser or LL parser with smart alternatives (!easy)

- Type inference: many type annotations (!language), brute-force enumerate candidates (!fast), Hindley-Milner or bidirectional, possibly extended for features like trait resolution (!easy)

- Allocation: manual (!language), put everything on the heap (!fast), heuristics, implicit boxing and unboxing (!easy)

There’s a trend (I also don’t know where it originates) that most “data processing” problems can be solved with an easier slower or trickier faster algorithm (hence, competitive programming). Static analyses are this class of problem, but for them (sometimes) there is a third option: simplify the problem by putting hints in the language.

foolswisdom 6 hours ago | parent | prev [-]

I would really appreciate if rust analyzer was faster, actually. It feels even worse with the fact that you need to save the file before it updates the type checking (though I assume it's because it's too slow to feel smooth if you do it while typing?).

Rusky 4 hours ago | parent [-]

The reason rust-analyzer doesn't update diagnostics until you save is historical. Originally, people tried to build IDE support by reusing rustc itself, but this proved too slow and cumbersome at the time.

Rust-analyzer reimplemented the frontend in a more IDE-friendly architecture, but focused more on name resolution than on type checking. So it delegated diagnostics to literally just running `cargo check`.

As parts of rustc get rewritten over time (the trait solver, borrow checker) they have also been made more IDE-friendly and reusable, so rust-analyzer is slowly gaining the ability to surface more type checking diagnostics as you edit, without delegating to `cargo check`.

snthpy an hour ago | parent | prev | next [-]

> Similarly, the nature of the trait system is such that impl blocks relevant to a particular method call can be found almost anywhere. For every trait method call, you get a dependency on the impl block that supplies the implementation, *but you also get a dependency on non-existence of conflicting impls in every other file*!

The last clause is a great point and one I will try to remember.

smj-edison an hour ago | parent | prev | next [-]

It seems like he's not arguing against query-based compilers per se, but rather he's arguing against the thinking that often comes with it. Namely, that you can amortize an expensive compilation process across incremental changes (which often doesn't work in practice because of the avalanche effect).

jitl 3 hours ago | parent | prev | next [-]

i agree on the whole that query is a good second line of defense versus the primary strategy. however you can do query based strategy and still use a diff/patch approach for some queries, just make the previous state of a query one of its inputs.

query based compilers are equivalent to component UI frameworks / signals / self-adjusting computation / spreadsheets. signia signals library implements this idea of using the previous output + diffs to compute the next output (https://signia.tldraw.dev/docs/incremental#using-diffs-in-co...)

zarzavat 3 hours ago | parent | prev | next [-]

> In contrast, you can’t really parse a file in Rust. Rust macros generate new source code, so parsing can’t be finished until all the macros are expanded.

Rust needs macros because the language is extremely strict and verbose. Macros are the main escape hatch, allowing you to simplify your code while maintaining memory safety without a GC.

Zig doesn't have memory safety as a goal so it seems like an unfair comparison.

Philpax 2 hours ago | parent | next [-]

I'm a Rust main, but this argument seems... incorrect? You would not need macros for Rust to remain a usable memory-safe language. They certainly make it easier, but they're not necessary. It would be perfectly possible to design a variant of Rust that gets you to 80-90% of Rust's usability, with the same safety, without macros.

armchairhacker 20 minutes ago | parent | prev | next [-]

A lot of the abstraction that Rust’s macros enable can be implemented via generics and traits, especially if they were extended (ex: keyword generics, variadic generics, orphan trait instances). Except traits are the other thing that complicates Rust’s name resolution…

foltik 2 hours ago | parent | prev | next [-]

Macros in Rust have nothing to do with memory safety. They're typically used for compile-time codegen (derive, println!) and very occasionally for syntactic abstraction (vec!).

jibal an hour ago | parent | prev [-]

This makes no sense at all.

The "comparison" (it isn't one--it's a technical point; "fairness" is not remotely relevant--Rust isn't being attacked and doesn't need defending) is between a language with macros and a language without macros--why the languages have or don't have macros isn't relevant.

Zig is strict and verbose and could benefit from macros but for numerous reasons doesn't have them.

Zig also has no GC and does have memory safety as a goal (but not the primary one, and not provable memory safety)--but none of this is at all relevant to the point the OP made, which is strictly about one language having macros and the other language not having them.

jiggawatts 5 hours ago | parent | prev [-]

> This is provably impossible to make usefully incremental. The encryption can be implemented as a graph of function calls, and you can apply the general incremental recipe to it. It just won’t be very fast.

The specific example is bogus.

Merkle trees and their many variants exist to solve precisely this problem.

For more compiler-specific scenarios there exist vaguely similar solutions to the issues introduced by incremental compilation such as splitting up monolithic executables into many small dynamically loaded libraries (only during development time), or taking that to the extreme, hot code reloading at the function level.

> ... only because Zig language is carefully designed to allow this.

Is the key point. Rust wasn't designed for incremental compilation, and most legacy languages like C only allow it in a useful way because they were designed in the era "kilobytes" of system memory and hence they're very frugal in their design.

Other than Zig, the only modern language I'm aware of that was "co-designed" for efficient compilation and incremental reload is Jai.

> Expanding macros requires name resolution, which, in Rust, is a crate-wide, rather than a file-wide operation.

Sure, but macros change infrequently, so that ought to be a highly cacheable pure function for most edits.

> The above scheme works only if the language has a property that changing the body of function foo (not touching its signature) can’t introduce type errors into an unrelated function bar.

AFAIK Rust and most other statically typed languages have this property. Optimisations such as inlining can mess with the cacheability of code-gen in all languages.

chrismorgan 3 hours ago | parent | next [-]

> Rust and most other statically typed languages have this property.

Actually, auto traits thwart this in some places for Rust: https://rust-lang.github.io/impl-trait-initiative/explainer/....

skybrian 3 hours ago | parent | prev | next [-]

Doesn't Go qualify? It has a fast compiler and it was designed that way.

jibal an hour ago | parent | prev [-]

> Optimisations such as inlining can mess with the cacheability of code-gen in all languages.

The statement was about introducing type errors, not cacheability of code-gen.