| |
| ▲ | aapoalas 7 days ago | parent | next [-] | | The impetus for the engine design is indeed, as you say, "someone decided to will it into existence." A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine? So I decided to find out. Nova itself has already been created at that point and I was part of the project, but it was little more than a README. I then started to push it towards my vision, and the rest is not-quite-history. | | |
| ▲ | kitd 7 days ago | parent | next [-] | | A friend of mine who works in the gaming industry told me about the Entity Component System architecture and I thought: Hey, wouldn't that work for a JavaScript engine? That was the first thing I thought of when I saw your description. But the reason ECS works well is cache coherence. (Why) would a general-purpose runtime environment like a JS engine benefit from ECS? Or alternatively, have you seen performance improvements as a result? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | I guess the opposite could also be asked: Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order. It comes down to statistics: Large data sets in a general-purpose runtime environment are still created through parsing or looping, and they are consumed by looping. A human can manually create small data sets of entirely heterogenous data, but anything more than a 100 items is already unlikely. Finally, the garbage collector is a kind of "System" in the ECS sense. So even if the JavaScript code has managed to create very nonlinear data sets, the garbage collector will still enjoy benefits. (Tracing the data is still "pointer chasing" but when tracing we don't need to trace in the data order but can instead gather a collection of heap references we've seen, sort them in order and then trace them.) | | |
| ▲ | kaoD 7 days ago | parent [-] | | > Why would a game benefit from ECS? A player in the game can do basically anything, there's no guarantee that things are always perfectly accessed in a linear order. There's actually a guarantee that things are mostly going to be accessed in a linear order because player actions don't matter to the execution of the simulation. The whole simulation is run at 1/FPS intervals across the whole set of entities, regardless of player input (or lack thereof). In an ECS the whole World is run by Systems, which operate on Components. This is why cache locality works there: when the Movement System is acting, it's operating on the Position Component for all (or at least many) Entities, so linear array access pattern is very favorable. Any other component in your cache is going to be unused until the next system runs (and then the Position Component will become the useless data in cache). That's why you'd rather have an array of Components in cache instead of an array of Entities. This access pattern is very suitable for games because the simulation is running continuously in an infinite loop (the game loop) consisting of even more loops (the Systems running), but not so much for general purpose computation where access patterns are mostly random. (EDIT: or rather, local to each "entity".) | | |
| ▲ | aapoalas 7 days ago | parent [-] | | It is very true that a general purpose computation can theoretically do anything and mess linearity of access patterns entirely up. But in practice programs do most of their work in very linear fashion. It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other. So in a sense we can say that the JavaScript program itself becomes the System with a capital S. That is not to say that Nova's heap vectors will necessarily make sense: The two big possible stumbling blocks are 1) growing of heap vectors possibly taking too long, and 2) compacting of heap vectors during GC taking too long. The first point basically comes down to the fact that, at present, each heap vector is truly a single Rust Vec. When it can no longer fit all the heap data into it, it needs to reallocate. Imagine you have 2 billion ordinary objects, and suddenly the ordinary objects vector needs to reallocate: This will cause horrible stalls in the VM. This can be mitigated at the cost of splitting each heap vector into chunks, but this of course comes at the cost of an extra indirection and some lack of linearity in the memory layout. The second point is more or less a repeat of the first: Imagine you have 2 billion ordinary objects, and suddenly a single one at the beginning of the vector is removed by GC: The GC has to now move every object remaining in the vector down a step to make the vector dense again. This is something that I cannot really do anything about: I can make this less frequent by introducing a "minor GC" but eventually a "major GC" must happen and something like this can then be experienced. I can only hope that this sort of things are rare. The alternative would be to do a "swap to tail", so the last item in the vector is moved to take the removed item's place. But that then means that linear access is no longer guaranteed. It also plays havoc on how our GC is implemented but that's kind of a side point. Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense. | | |
| ▲ | kaoD 7 days ago | parent | next [-] | | > It's not by chance that eg. V8 will try to write objects parsed from a JSON array of objects one right after the other. Yes, but note this is still a different pattern of access (array of "entities"). V8 does this because it assumes that e.g. `foo.name` is very likely going to be accessed along with `foo.lastName` (which is likely the 99% case for general computing) whereas ECS assumes `foo.name` is very likely going to be accessed along with `foo2.name`, `foo3.name`, ..., `fooN.name` (which is the 99% case for videogame timestep loops). > Software engineering is architecture is full of trade-offs :) I'm just hoping that the ones we've made will prove to make sense. To clarify: my comment is not a criticism of Nova's design decisions. I was only trying to clarify the answer to "Why would a game benefit from ECS?" for those foreign to ECS's existential motive. I'm sure Nova's tradeoffs make sense for some workloads and I wish you the best! | | |
| ▲ | aapoalas 7 days ago | parent [-] | | Thank you very much for your well-wishes <3 > Yes, but note this is still a different pattern of access (array of "entities"). I was referring to the `[foo, foo2, foo3]` objects themselves; V8 does use an "cache local" placement for those so you'll find them laid out in memory as: > [foo_proto, foo_elems, foo_props, foo_name, foo_lastName, foo2_proto, foo2_elems, foo2_props, foo2_name, foo2_lastName, ...] For what it's worth, I am interested in laying object properties out in an ECS like manner in Nova, so the properties would be laid out as `[foo.name, foo2.name, foo3.name, ...]`, but currently the properties are laid out similarly to V8, `[foo.name, foo.lastName]`. The only difference is that we do not have "in object properties". That being said: I am obviously biased, but I do wonder if an ECS-like layout wouldn't be nearly universally beneficial. Thinking of the `foo.name` and `foo.lastName` access: If those are on the same cache line then accessing the two only reads one cache line. This is nice. But if there are more properties in the objects (and there often are), then those will pollute the cache. If you do this access once, it doesn't matter. If you do this a million times, now the cache pollution becomes a real issue: In Node.js even the optimal case would be that you read read 625,000 cache lines worth of data, only to discard 250,000 cache lines of it. If instead we use an ECS-like layout, then accessing these two properties reads two 10100cache lines: That's bad, but on the other hand if this happens once then it won't even make a blip on the screen. If a million of these accesses are done, you could think that we'd suddenly be slow as molasses but now the ECS-like layout is probably going to help you: You're more likely reading the next `name` and `lastName` property values on each access. If you have it bad and only half of the property data you read is actually the `name` and `lastName` properties you want, then you read 750,000 cache lines and lose out to the traditional engine by 100,000 cache lines. If you get 67% "hit rate" then you break even. And that's comparing to the case where the objects only contain `name` and `lastName` and nothing more. It of course all comes down to statistics but... I'm very interested in the potential benefits here :) Again, thank you for your comments, I've enjoyed discussing and pondering this <3 | | |
| ▲ | dinfuehr 7 days ago | parent [-] | | Would this mean that each shape/structure/map get its own vector for each field in order for this to work? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | That would be one way; it would offer the best theoretical memory layout for well-behaved programs. But, I expect it to be very painful to work with and to come with some performance penalties in mixed use cases due to the extra indirection required. No, my thinking is that properties would be stored into tables based on their size class: All objects that have 4-7 properties are in the same table, and all of their first property would be in the same slice, second property in another etc. | | |
| ▲ | munificent 7 days ago | parent [-] | | That sounds to me like you'll end up getting little benefit from ECS then. Let's say the JavaScript program is iterating over a hundred thousand instances of some Foo class which happens to have 6 properties. You'd ideally want good use of cache, but if your object vector that has all of the Foo objects also happens to have all sorts of instances of other types that have the same field count mixed in there, then you're going to spend a lot of time skipping over those unrelated objects and refreshing the cache. I know that ECS is treated as a silver bullet by a lot of people, but my experience is that it really only works well when the data you're working with is statically typed so that you can actually partition into arrays where each array does represent a single meaningful type. | | |
| ▲ | aapoalas 7 days ago | parent [-] | | That is definitely possible: If it turns out to not make sense then I can at least always go back go the current system where all properties of an object as stored in a single array. It's not as ECS'ssy as one would hope but it's at least proven technology :D |
|
|
|
|
| |
| ▲ | tubs 7 days ago | parent | prev | next [-] | | Virtual alloc your vectors so you can add more backing memory without having to modify the addresses of existing items. Compaction can reap only empty pages but you’ll still need some moving to avoid over fragmentation. | | |
| ▲ | aapoalas 7 days ago | parent [-] | | Yeah, virtual alloc for the Vec backing memory is something I hope to do _one day_. It's not a very pressing concern however, as it requires going much lower in the stack. |
| |
| ▲ | dinfuehr 7 days ago | parent | prev [-] | | > The GC has to now move every object remaining in the vector down a step to make the vector dense again Is there something which forces you to compact everything here? Or could you do what most GCs do and track that free entry in a free list? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | I would have to keep a free list per vector, and there's a lot of those, and it would defeat the point of keeping data packed and temporally colocated: I want to ensure that data was created together stays together and only ever gets more compact as it grows older. Filling in empty slots would mean that likely unrelated data comes and pollutes the cache for the old data :( | | |
| ▲ | dinfuehr 6 days ago | parent [-] | | I see. Thanks for the answers btw! I get your point but a few gaps here and there likely don't matter at all for performance. At least it's a lot better than making everything super compact all the time. Assuming you are splitting vectors at some points into chunks: In such a world you could choose to get rid of chunks with a lot of gaps and move the remaining entries into other chunks. At that point you really have a regular GC. And the free list could be stored in the vector itself. E.g. if an entry is empty it would store the pointer to the next free entry. So all you need is a single head/tail index for each vector. I also wonder how you handle pointers which could point into one of many vectors. E.g. a field could easily point either to an object or an array. Do you plan to pack this vector id into the 32-bit value? If so wouldn't there be a lot of dispatch like this as well: if (index & VECTOR_ID_MASK == OBJECTS_VECTOR_ID) { return objects[index&VECTOR_INDEX_MASK]; } else { .. } I hope it's clear what I mean with this. | | |
| ▲ | aapoalas 6 days ago | parent [-] | | Thank you for the discussion! A few gaps won't matter, and that to me speaks of a split between major and minor GC making sense. However, I'm not really sold on that meaning a free list making sense. For one, if I split the heap vectors into single value parts, then holding free slot data in any of them will become somewhat complicated. Hence at least for the foreseeable future I'm 100% in on the compacted heap vectors idea :) Time will tell if the aggressive compacting makes sense or not. Our JavaScript Values are the full 8 bytes (yes, this is large and it pains me, but it does give us all integers on stack, most doubles on stack, and up to 7 bytes of string data on stack), so a field that can point to any kind of object stores a byte tag and a u32 index. I might pack this down to 1+3 bytes or so, at the cost of supporting smaller maximum number of objects in the engine. JS Value itself would still probably remain 8 bytes because of the stack data benefits. There is indeed dynamic dispatching through match statements, though it generally happens at the specification method level. Eg. A specification method to get a property from an object will match on the tag and then dispatch to a concrete method with the index as parameter. The indexes are typed as well, so from this point on we statically know we're dealing with eg. an Array. So there is dynamic dispatch yes, but we try to eliminate it at the earliest opportunity. We probably still have more of that than a traditional engine would have though: A traditional engine will keep the tag on the heap and there is some dynamic dispatch done based on that, but at least your data lookup isn't based on dispatch. |
|
|
|
|
|
|
| |
| ▲ | specialist 6 days ago | parent | prev [-] | | I think you're on to something (important). Decomposing structs into separate arrays (heaps) is becoming a thing. eg Rust and others are introducing language features to manually (explicitly) do so. It could be cool if the runtime just handled it. I stumbled across a new research language with new syntax for just this purpose, to better express iteration and lambdas. IIRC. Sorry, I was looking for something else (got nerdsniped by u/hinkley's mention of Erlang's "set-theoric types" ), and didn't bookmark it. If I find it again, I'll forward the link. Maybe someone else here knows what I'm talking about. | | |
| |
| ▲ | andai 7 days ago | parent | prev | next [-] | | This is cool, but I'm wondering (1) Why doesn't V8, whose whole point is performance, lay out memory in an optimal way? (2) Will Nova need to also implement all of V8's other optimizations, to see if Nova's layout makes any significant difference? | | |
| ▲ | aapoalas 7 days ago | parent | next [-] | | V8 could probably implement the backing object "trick" with some trouble. I'm half-hoping that Nova will show it to be worth their while and that they will eventually do it. It will be a major refactoring of the engine, however. The heap vector "trick" is basically impossible, I believe. It wouldn't be a refactoring so much as it would be a complete rewrite of the engine. The entirety of V8 assumes it deals in pointers, and all of that would need to change to using indexes instead. I will eat my hat if they do it. Without heap vectors they can still split object data apart using pointer-keyed hash maps, so maybe they could take advantage of some of the ideas still. V8 does offer ways to run code without optimisations, which we can use for a more apples-to-apples comparison. The most important optimisation that Nova really needs before any big performance comparisons become meaningful is property access inline caching, which requires implementing object shapes. I'd say that once object shapes are done, then limited performance comparisons can probably be made, especially if V8's JIT is disabled. | | |
| ▲ | Leszek 7 days ago | parent | next [-] | | To be fair, pointer compression is morally and memory-wise similar to indexing a vector. | | |
| ▲ | aapoalas 7 days ago | parent [-] | | Yup, and to a degree the whole "heap references as indexes" idea was inspired by pointer compression. Not in a direct sense of "hey, look at that, what if I took it a step further?" but as I was thinking of the indexes I realised that it looks a lot like pointer compression, and that made me think it is a viable idea. |
| |
| ▲ | reverius42 7 days ago | parent | prev [-] | | So is the whole point of this project to convince V8 to adopt a particular optimization? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | Not really: In my daydreams Nova becomes the premier JS engine in the world and takes the crown from V8. If V8 went all in and basically just copied all of Nova... I'd probably still develop Nova, as I don't want to work with C++ that much. If V8 copied all of Nova AND adopted Rust, I might consider laying Nova to rest and going into V8 development. But I'd probably also be really angry at V8 just taking all of Nova's good ideas and peddling them off as their own without crediting Nova. So probably I'd still keep developing Nova while stewing in my anger and inability to do anything about it :) I hope Nova can be a spark that ignites the JavaScript world into a bit of a renaissance with some of its ideas, but the point is not to burn bright and burn out. The point is to burn bright and stay lit. | | |
| ▲ | rob74 7 days ago | parent [-] | | > But I'd probably also be really angry at V8 just taking all of Nova's good ideas and peddling them off as their own without crediting Nova. Who knows, maybe they'd even give you credit (while still taking the idea)? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | It could definitely happen. It would be a hard decision for me then :) |
|
|
|
| |
| ▲ | munificent 7 days ago | parent | prev | next [-] | | Optimal memory layout isn't something you can know in advance. The optimal way to lay out objects in memory is exactly in the order that they will be accessed, but the runtime doesn't know that until after the user program has accessed them. And if the program accesses a set of objects in different orders at different times, there is no one optimal layout. I did ask Lars Bak once if they spent a lot of time thinking about cache usage and organizing objects in memory to take best advantage of it and, if I recall correctly, his answer was basically "no". They definitely think about it terms of packing objects into small amounts of memory. But in a dynamically typed language like JavaScript where every property is a reference to some other object elsewhere in memory, using the cache well is just profoundly hard. Hell, it's hard even in Java where at least you do know the set of fields any given class has. | |
| ▲ | liontwist 7 days ago | parent | prev [-] | | 1 it takes time and effort to make major architectural changes. Certain design choices made for other reasons may conflict. |
| |
| ▲ | lucms_ 7 days ago | parent | prev [-] | | Isn't data-oriented design about organizing the data in a way that reflects the most common access patterns of the program?
The approach of placing all numbers in a big number vector, all Arrays into a big Array vector, and so on, would be "data-oriented design" if it actually reflects the most common access patterns. So, is it the case that when you read a number you also want all those other numbers that come together with it in the cache line? Is that the case for Arrays? For DataViews? In other words, does this approach to allocating memory reflect the most common data access patterns in JavaScript programs?
I'm not saying it's a bad approach, and I'm not even trying to imply that it's not DOD, I'm genuinely asking. Edit: maybe a better question is: does it reflect the most common data access patterns of a JavaScript Engine? | | |
| ▲ | aapoalas 7 days ago | parent [-] | | Excellent question: In a theoretical sense the answer would be that we cannot know since it depends on the JavaScript being run. But: In practice I think that is indeed the case. Especially for the more common an object is, the likelier it is that it is used in conjunction with others around it. At the same time, the more important their memory placement becomes. eg. Say you have a JS programs that has about a 100 DataViews: I'd say it's unlikely these are used in conjunction with others very often, but they're also only a small part of the program, so their placement is mostly whatever. Now what if that number is a million instead? Now I'm betting they're mostly all created together, used together, and that their placement is critical to the program's performance. So, I'm betting that making random memory access performance worse while guaranteeing that data created together stays together and improving linear memory performance will be an overall win. Whether this is true data-oriented design is then in the eye of the beholder: Maybe someone will think I'm wrong, my assumptions are wrong, and I'm thus not doing things in a data-oriented way. |
|
|