Remix.run Logo
liontwist 7 days ago

This is a great idea! I had thought about doing this with a lisp interpreter. I had identified a few key advantages:

- homogenous allocation means no alignment gaps - linear access win in garbage collection - indices smaller than pointers - type discriminated index can save some size

I haven’t verified whether those actually work out in the details. I’ll read your blog article.

Don’t bother with these comments immediately comparing it to V8 (a multi billion dollar venture). I don’t know how many creative projects they’ve done before.

You may be be interested in looking at Fabrice Bellard’s JS engine for ideas.

mbrock 7 days ago | parent | next [-]

I actually made a Lisp interpreter in Zig a couple of years ago that has each object type in a separate heap array. In fact each field of each object type has its own array: every CDR is in one contiguous array. This was mostly for fun and to experiment with data-driven techniques using Zig metaprogramming. The code turned out relatively clean and simple.

https://github.com/mbrock/wisp

GC is stop&copy which as a side effect compacts each of those arrays and improves locality. I think most lists should end up having their CDRs next to each other in memory making iteration very cache friendly. But I didn't verify any performance qualities, beyond making it efficient enough for basic use.

It also has delimited continuation control, compiles to WebAssembly, and hooks promises into the continuation system, among some other pretty cool features!

aapoalas 7 days ago | parent | next [-]

Well I'll be damned! That sounds very much like what I want Nova to eventually be :) We don't have fields split apart at present, mostly because Rust doesn't make that quite as easy as I would want to. Otherwise it sounds like it's very much all the same, in a good way.

I'll definitely be taking a look at wisp, thank you very much for the link! If you ever have the time, I'd love seeing a comparison of this sort of engine design against a more traditional one.

Sorry, what is "CDR" in this context though?

mbrock 7 days ago | parent [-]

Quick reply to the cdr thing: car/cdr are old Lisp names for the head/tail fields of linked list cells! :)

aapoalas 7 days ago | parent [-]

Ah, of course!

liontwist 7 days ago | parent | prev | next [-]

Yes. the right thing to do is to treat a list as a general case and other uses of cons as special case

codr7 7 days ago | parent [-]

I've flipped that idea around in a few of my own language designs, where pairs are the central feature and lists are just pairs with pair cdrs. Works fine from what I can see.

liontwist 7 days ago | parent [-]

Yes pairs is the 1980s lisp design, but it’s not good for modern caches. Both obviously work.

mbrock 7 days ago | parent | prev [-]

Oh yeah, continuation pointers also have their own array like every other field kind, which should have similar benefits as list traversal but for continuation copying... It's a really interesting design area, I think. Zig makes it easy!

aapoalas 7 days ago | parent [-]

Yeah, I'm quite envious of the MultiArrayList or whatever it was that Zig has: If only Rust had that sort of a type built-in <3

mbrock 7 days ago | parent [-]

That's how I got interested in this kind of memory layout in the first place. I wanted a nice Lisp for WebAssembly and had recently gotten into Zig. When I started defining the word structure I remembered Andy Kelley's talk about using data-oriented design to make the Zig compiler fast, so I thought I'd try it, and the more I thought about it the more reasonable it seemed.

There are like a dozen object types with different growing multiarrays. Words are 32 bit with 1 for GC state and 27 for index and the rest are the type tag. Ints are 28 bits. Byte arrays have their own heap too, as well as general 32 bit vectors.

aapoalas 7 days ago | parent | prev [-]

Thank you for the encouragement! Avoiding alignment gaps is indeed pretty great: I have a vision of packing Arrays into 9 bytes split over two or three cache lines.

On typed indexes: If we accept only about 2^24 possible index values then we could use a 32 bit integer for our Values, or at least for Objects (if we want to keep 7 bytes worth of stack data, which is pretty hard to pass on).

I love the comments comparing Nova to V8: That's what I want to aim for after all :) I'm not sure I've heard of Fabrice Bellard's JS engine, thanks, I'll take a look!

NoahKAndrews 7 days ago | parent [-]

Your blog mentions QuickJS, which I believe is the mentioned engine by Fabrice

aapoalas 7 days ago | parent [-]

Oops :D