Remix clone Hacker News

This is very cool. Extremely fast lexical tokenizer is the basis for a fast compiler. Zig has good integration and support for SIMD operations that's perfect for this kind of things. It's definitely doable. I did a proof of concept on using SIMD to operate on 32-byte chunk to parse identifiers a while back.

https://github.com/williamw520/misc_zig/blob/main/identifier...

▲

norir 3 days ago | parent [-]

When I run a profiler on a compiler I wrote (which parses at somewhere between 500K-1MM lines per second without a separate lexer), parsing barely shows up. I'd be very surprised if the zig compiler is spending more than 5% of the time tokenizing.

I assume there is some other use case that is motivating this work.

▲

tarix29 3 days ago | parent [-]

I imagine it would be quite useful for building a responsive language server, where parsing is a more significant portion of the work

▲

seanmcdirmid 3 days ago | parent [-]

No, the problem for a language server is incremental performance, not batch performance. Although there are a lot of bad implementations out there that just reparse the entire buffer on each edit (without the error recovery benefits an incremental parser would give you).

▲

adev_ 3 days ago | parent [-]

> No, the problem for a language server is incremental performance, not batch performance

"When something is fast enough, people start to use it differently" - Linus Torvalds.

Make your parser able to parse the current file at 30FPS and you do not need incremental parsing anymore nor error recovery. That is probably part of the idea here.

▲

dzaima 3 days ago | parent | next [-]

Here that can go both ways - SIMD parsing can allow handling arbitrary changes in reasonable time for files below like maybe 100MB (i.e. everything non-insane), whereas incremental parsing can allow handling small changes in truly-arbitrary-size files in microseconds. A trade-off between better average-case and worst-case time. (of course the ideal thing would be both, but that's even more non-trivial)

▲

awson 2 days ago | parent | prev | next [-]

Absolutely.

Quite a long time ago I was working on a some business application's reporting facility.

It used to take about an hour, and my development reduced this time to a 1 or 2 seconds ballpark.

This was HUGE. And changed the way users create these reports forever.

▲

seanmcdirmid 2 days ago | parent | prev [-]

It’s not good enough. Incremental parsers can save trees across edits, and you can hang type information off of those trees, so you just aren’t saving parsing time, you are saving type checking time as well. Even if you have a super fast batch parser, you are screwing yourself in other areas that are actually much more expensive.

	▲	adev_ 2 days ago \| parent [-]
		Agreed. But all things considered: The runtime cost of type checking is highly dependent on the type system / meta-programming complexity of your language. For simple languages (Golang?) with a pretty well designed module system: it should be doable to reach ~500KLOC/sec (probably even 1MLOC/s in some case) so more than enough for an interactive usage. And for complex languages with meta-programming capabilities: they are indeed slow to type check. But are also a giant pain in the butt to cache without side effects for incremental parsing. It is 2025 and clangd / intellisense still fail to do that reliably for C++ codebases that rely heavily on template usage. So it does not seem a so-crazy approach to me: It is trading a complexity problem for a performance one.