| ▲ | pizlonator 5 days ago |
| > I wonder if it's feasible to use Fil-C to do multitasking between mutually untrusted processes on a computer without an MMU? You could. That said, FUGC’s guts rely on OS features that in turn rely on an MMU. But you could make a version of FUGC that has no such dependency. As for perf - 4x is the worst case and that number is out there because I reported it. And I report worst case perf because that’s how obsessive I am about realistically measuring, and then fanatically resolving, perf issues Fact is, I can live on the Fil-C versions of a lot of my favorite software and not tell the difference |
|
| ▲ | foldr 5 days ago | parent | next [-] |
| > As for perf - 4x is the worst case and that number is out there because I reported it I love the concept of Fil-C but I find that with the latest release, a Fil-C build of QuickJS executes bytecode around 30x slower than a regular build. Admittedly this is an informal benchmark running on a GitHub CI runner. I’m not sure if virtualization introduces overheads that Fil-C might be particularly sensitive to (?). But I’ve sadly yet to see anything close to a 4x performance difference. Perhaps I will try running the same benchmark on native non-virtualized x86 later today. Also, so I am not just whining, my Fil-C patch to the QuickJS main branch contains a fix for an issue that’s only triggered by regex backtracking, and which I think you might have missed in your own QuickJS patch: http://github.com/addrummond/jsockd/blob/main/fil-c-quickjs.... |
| |
| ▲ | pizlonator 5 days ago | parent | next [-] | | 30x? Oof I know that I regressed quickjs recently when I fixed handling of unions. It’s a fixable issue, I just haven’t gone back and fixed it yet. I definitely don’t see 30x overhead on anything else I run. But thanks for pointing that out, I should probably actually fix the union handling the right way. (What’s happening is every time quickjs bit casts doubles to pointers, that’s currently doing a heap allocation. And it’s obviously not needed. The simplest compiler analysis would kill it. I just turned off the previous instance of that analysis because it had a soundness issue) | | |
| ▲ | foldr 4 days ago | parent [-] | | Thanks for the response, that’s useful to know. It’s honestly amazing (to me) that Fil-C works at all, and I’m sure the performance will continue to improve. |
| |
| ▲ | kragen 5 days ago | parent | prev [-] | | I look forward to seeing how this shakes out. Fanatically, I hope? |
|
|
| ▲ | modeless 5 days ago | parent | prev | next [-] |
| A Fil-C kernel that ran the whole system in the same address space, safely, would sure be something. Getting rid of the overhead of hardware isolation could compensate for some of the overhead of the software safety checks. That was the dream of Microsoft's Singularity project back in the day. I guess there would be no way to verify that precompiled user programs actually enforce the security boundaries. The only way to guarantee safety in such a system would be to compile everything from source yourself. |
| |
| ▲ | miki123211 5 days ago | parent | next [-] | | This is what IBM I[1] (AKA AS400) does I think. Ibm I applications are compiled to a hardware-independent intermediate representation called TIMI, which the SLIC (kernel) can then compile down to machine code, usually at program installation time. As the SLIC is also responsible for maintaining system security, there's no way for a malicious user to sneak in a noncompliant program. [1] https://en.wikipedia.org/wiki/IBM_i | | |
| ▲ | pdw 5 days ago | parent | next [-] | | I always wondered how secure AS/400 actually is. The original implementation might have checked tag bits in hardware (I don't know), but the current (PowerPC) implementation relies on the compiler generating a "check tag bits" every time a pointer is dereferenced [1]. So it seems that any arbitrary code execution vulnerability would be absolutely devastating. And the "SLIC" is not a small microkernel -- it also contains the compilers, the database and other system components. It'd be hard to believe there would no exploitable bugs in there. [1] https://www.devever.net/~hl/ppcas | | | |
| ▲ | kragen 5 days ago | parent | prev | next [-] | | Correct, although I can't be sure I'm remembering the names of the parts correctly. Soltis's book Inside the AS/400 https://archive.org/details/insideas4000000solt is fascinating reading, but the title overpromises rather badly; there is no list of opcodes, for example. | |
| ▲ | ptx 5 days ago | parent | prev [-] | | That's basically the same idea as WebAssembly, isn't it? | | |
| ▲ | zozbot234 5 days ago | parent [-] | | I don't think WebAssembly has been applied across a whole system just yet. Inferno/Limbo (the successor to Plan9, using the Dis virtual machine) may be substantially closer to the mark, along with AOSP (based on Dalvik/ART) and a variety of JavaScript-based "web" OS's. One may also argue that "image"-based systems like Smalltalk, Oberon etc. are in the same class, and that the lineage ultimately originates from Lisp machines. | | |
| ▲ | kragen 5 days ago | parent [-] | | Smalltalk predates Lisp machines and didn't originally compile to native code at all. I don't remember if Limbo did. Oberon isn't image-based (you can't save and restore the memory state of the running system) and didn't originally define a machine-independent bytecode format, and the one it had for many years has been removed from the current version. Wasm usually isn't image-based either; though it has a clear pathway for doing so, for example Wasmtime still doesn't implement that functionality: https://github.com/bytecodealliance/wasmtime/issues/3017 | | |
| ▲ | miki123211 5 days ago | parent [-] | | AS400 isn't image based either. And unlike AS400, I don't think either Smalltalk or Lisp machines used the bytecode abstraction to achieve security. | | |
|
|
|
| |
| ▲ | pizlonator 5 days ago | parent | prev [-] | | You could have enforcement that binaries use Fil-C rules suing proof carrying code | | |
| ▲ | kragen 5 days ago | parent [-] | | I'm skeptical that PCC can work in practice with existing social practices around software development, because neither users nor implementors can tell the difference between a correct verifier and one that has flaws that aren't being actively exploited yet, but they can sure tell the difference between a fast system and a slow one. The incentives are strong to drive the runtime cost as close to zero as possible, which involves making your proof-checking system so expressive that it's hard to get right. The closer you get to zero, the more performance-sensitive your userbase gets. No part of your userbase is actively testing the parts of your verifier that reject programs; they try to avoid generating programs that get rejected, and as the verifier gets larger and larger, it requires more and more effort to generate code that exploits one of its bugs, although there are more and more of them. As the effort required to slip malicious code past the verifier grows, the secret in-house tools developed by attackers gives them a larger and larger advantage over the defenders. Continued "maintenance" applied to the verifier drive its size and complexity up over time, driving the probability of a flaw inexorably toward 100%, while, if it is not "maintained" through continuous changes, it will break as its dependencies change, it will be considered outdated, and nobody will understand it well enough to fix a bug if it does surface. We've seen this happen with Java, and although it's clearly not unavoidable in any kind of logical sense, it's a strong set of social pressures. Dynamic checking seems much more memetically fit: developers will regularly write code that should fail the dynamic checks, and, if it passes instead, they will send you an annoyed bug report about how they had to spend their weekend debugging. | | |
| ▲ | zozbot234 5 days ago | parent [-] | | Proof carrying code is efficient at runtime; the verifier only ever has to run once to verify the binary, which in principle can simply be done at install. Even the verification itself can be fast enough because it's only checking the soundness of existing proofs, not having to generate new ones from scratch. Where there are genuine needs to "make the proof system more expressive", this can be done by adding trusted axioms which will generally be simple enough to manually check; the verifier itself need not become more complex. We've seen a lot of complexity happen with Java but that was generally in the standard library facilities that are bundled with the language and runtime, not really the basic JVM type checking pass. Proof carrying code is closer to the latter. | | |
| ▲ | kragen 5 days ago | parent [-] | | I agree that the verifier performance is not important to runtime performance (though there have been some changes to the JVM in particular to speed up class loading), but the expressivity of the safety proofs it can check is, because safety properties that cannot be proven at load time must be checked at runtime. Initially, and I think still, the JVM's downcast on loading an object reference from a generic container like ArrayList<Price> is an example: javac has proven statically that it is safe, but the JVM bytecode cannot express that. Pointer nullability in JVM languages like Kotlin is another example: Kotlin knows most object pointers can't be null, but the JVM still has to check at runtime. There have been numerous holes discovered in various implementations of the basic JVM type checking, often after existing for many years. | | |
| ▲ | pizlonator 5 days ago | parent | next [-] | | The JVM byte code situation isn’t a great example because that was a series of deliberate design choices for lots of complex reasons. And, the JVM absolutely can guarantee memory safety at the bytecode level. It’s just working with a slightly more dynamic type system than Java source. What would happen if you tried to do PCC for InvisiCaps and FUGC is that it would ultimately constrain what optimizations are possible, because the optimizer would only be able to pick from the set of tricks that it could express a proof for within whatever proof system we picked Totally not the end of the world. Do I think this an interesting thing to actually do? I’m not sure. It’s certainly not the most interesting thing to do with Fil-C right now | | |
| ▲ | kragen 4 days ago | parent [-] | | Yes, I agree with you that a correct JVM can guarantee memory safety at the bytecode level, but what I meant to express was that many JVMs have had bugs that caused them to fail to do so, for essentially social reasons which I expect to cause problems with other PCC systems as well. Maybe you're right, and those problems are not inevitable; for example, if you could reduce the proofs to a tiny MetaMath-like kernel that wouldn't need constant "maintenance". As you say, that could move the compiler's optimizer out of the TCB — at least for the security properties Fil-C is enforcing, though the optimizer could still cause code to compute the wrong answers. That seems like something people would want if they were already living in a world where the state of computer security was much better than it is now. |
| |
| ▲ | gf000 5 days ago | parent | prev [-] | | I mean, since Gödel we pretty much have known that there could never be a system "without holes". | | |
| ▲ | kragen 5 days ago | parent [-] | | No, that is not correct. Gödel showed that some theorems are unprovable in a consistent formal axiomatic system; he did not show that no consistent formal axiomatic systems exist. | | |
| ▲ | gf000 5 days ago | parent | next [-] | | He did show that every formal axiomatic system will have statements that can't be proven "from within". For these, you are left with doing a runtime check. | | |
| ▲ | kragen 4 days ago | parent [-] | | Yes, that's true, but then possibly we are talking at cross-purposes; when I said "numerous holes discovered in various implementations of the basic JVM type checking", I didn't mean things that needed to be checked at runtime; I meant bugs that permitted violations of the JVM's security guarantees. However difficult it may be to avoid such things at a social level, certainly there is no mathematical reason that they must happen. |
| |
| ▲ | naasking 5 days ago | parent | prev [-] | | > Gödel showed that some theorems are unprovable in a consistent formal axiomatic system... ...of sufficient power, eg. that can model arithmetic with both addition and multiplication. I think the caveats are important because systems that can't fully model arithmetic are often still quite useful! | | |
| ▲ | gf000 5 days ago | parent [-] | | Indeed! But I am afraid general purpose programming languages almost always need that kind of power (though being Turing complete is not necessary) | | |
| ▲ | naasking 11 hours ago | parent [-] | | They need it operationally for calculations and such, but those calculations don't necessarily need to be statically validated beyond simple things like type and unit checking. |
|
|
|
|
|
|
|
|
|
|
| ▲ | torginus 5 days ago | parent | prev | next [-] |
| How would you go about writing a program/function that runs as close to native speed as possible on Fil-C? How much more memory do GC programs tend to use? Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)? You mention 'accurate' by which I assume you use the compiler to keep track of where the pointers are (via types/stackmaps). How do you deal with pointers that get cast to ints, and then back? |
| |
| ▲ | pizlonator 5 days ago | parent [-] | | > How would you go about writing a program/function that runs as close to native speed as possible on Fil-C? Avoid pointer chasing. Use SIMD. > How much more memory do GC programs tend to use? I would estimate 2x Fil-C has additional overheads not related to GC, so maybe it’s higher. I haven’t started measuring and optimizing memory use in anger. > Curious, how do you deal with interior pointers, and not being able to store type info in object headers, like most GC languages do (considering placement new is a thing, you can't have malloc allocate a header then return the following memory, and pointer types can lie about what they contain)? See https://fil-c.org/invisicaps |
|
|
| ▲ | willvarfar 5 days ago | parent | prev | next [-] |
| When you run the Fil-C versions of your favourite software, does it have a sanitizer mode that reports bugs like missing free() etc? And have you found any bugs this way? |
| |
| ▲ | pizlonator 5 days ago | parent [-] | | Well missing free is just swallowed by the GC - the leak gets fixed without any message. I have found bugs in the software that I’ve ported, yeah. | | |
| ▲ | purpleidea 5 days ago | parent | next [-] | | This is neat work. I noticed on your software page some patches you had to make to get those things to compile. Have you sent those upstream? Eg, I noticed a simple 1 line bash change for example. | |
| ▲ | writebetterc 5 days ago | parent | prev [-] | | To add on top of this: This is a tracing GC. It only ever visits the live data, not the dead data. In other words, it would need a lot more special support if it wanted to report the dead objects. | | |
| ▲ | tomp 5 days ago | parent | next [-] | | A non-moving GC must visit dead objects. | | |
| ▲ | pizlonator 5 days ago | parent | next [-] | | Not quite. FUGC used a bit vector SIMD sweep using a bit vector on the side so it doesn’t visit the dead objects at all in the sense that it doesn’t touch their contents. And it only visits them in the sense that a single instruction deals with many dead or alive objects at once. | |
| ▲ | writebetterc 5 days ago | parent | prev [-] | | I forgot that this GC is non-moving (I'm not used to that assumption, and it was a bit of a quick comment). I do find the statement dubious still, do you mind clearing it up for me? Given a page { void* addr; size_t size; size_t alignment; BitMap used; } where used's size in bits is page.size / page.alignment, surely we only need to visit the used bitmap for marking a memory slot as free? | | |
| |
| ▲ | kragen 5 days ago | parent | prev [-] | | Really? How does a non-moving GC make dead objects available for reallocation without visiting them? | | |
| ▲ | torginus 5 days ago | parent | next [-] | | Why would it need to visit them? It just marks the address ranges as available in its internal bookkeeping (bitmaps etc). | | |
| ▲ | kragen 5 days ago | parent [-] | | In the general case there are as many newly available address ranges as dead objects, so that counts as visiting them in this context. | | |
| ▲ | torginus 5 days ago | parent | next [-] | | I don't think that's a definition of 'visit' most people would agree with. I'm actually working on my own language that has a non-moving GC. It uses size classes (so 16 byte objects, 32 byte objects etc.), each of which is allocated in a continous slab of memory. Occupancy is determined by a bitmap, 1 bit for each slot in the slab. The GC constructs a liveness bitmap for the size class, and the results are ANDed together, 'freeing' the memory. If you fill the slab with dead objects, then run the GC, it will not walk anywhere on this slab, create an all zero liveness bitmap, and free the memory. | | |
| ▲ | kragen 5 days ago | parent [-] | | That's an awesome project! Is your GC generational despite being non-moving? What are your main objectives for the project? The liveness bitmap approach is pretty widespread at this point; jemalloc works the same way IIRC. Still, I think that counts as "visiting" in the context of this discussion: https://news.ycombinator.com/item?id=45137139 | | |
| ▲ | writebetterc 5 days ago | parent | next [-] | | I don't think it counts as visiting, as you never look at the dirtied bitmap during GC, only during allocation. That means, you don't actually know if a dirty bit represents a different object or not (if a 16-byte size class is allowed to have 32-byte objs in it, for example). To know that you'd either have to have strict size classes, or you'd have to have object headers for specifying the start of an object. I agree that it's easy to add in a visitation pass, where you take the bitmap of live objects after marking and diff it with the currently existing one in order to signal that you might have a leak. So basically, I think we're like 99% in agreement. | | |
| ▲ | kragen 5 days ago | parent [-] | | It's always nice when the impact of collision of opposing opinions gives rise to the spark of mutual understanding rather than merely inflaming base emotions. Typically bitmap-based allocators don't actually allow a 16-byte size class to have 32-byte objects in it, but I haven't looked at FUGC to see if that's true of it. | | |
| ▲ | torginus 5 days ago | parent | next [-] | | I toyed with the idea of allowing this, in bitmaps, it's pretty easy and efficient to find contiguous areas with bit twiddling hacks, for example //assume free map is the bitmap where 1 means free uint32_t free_map; uint32_t free_map_2 = (free_map & (free_map >> 1)); // so on and so forth I haven't really done anything like this yet, it has certain disadvantages, but you can pack multiple size classes into the same bitmap, you do a bit more work during alloc and resolving interior pointers is a bit more costly (if you have those), in exchange for having less size classes. | | |
| ▲ | kragen 5 days ago | parent [-] | | Sure, to find contiguous chunks of 6 slots within a single word, you can do t &= t << 1;
t &= t << 2;
t &= t << 2;
and that sort of thing is pretty appealing, but you lose the ability to know what size an object is just by looking at an address, and it's still a lot slower than scanning for an open slot in a page of 5× bigger objects.Should I assume from your use of uint32_t that you're targeting embedded ARM microcontrollers? |
| |
| ▲ | pizlonator 5 days ago | parent | prev [-] | | FUGC is size segregated. 16 byte size class will only have 16 byte objects. A bunch of other optimizations fall out from doing that |
|
| |
| ▲ | torginus 5 days ago | parent | prev [-] | | It's not generational, because unlike Java, but like C or C++, programs aren't supposed to generate a lot of ephemeral objects while they run. I also wanted to keep things as simple as possible to have a chance of actually shipping something in my lifetime :D | | |
| ▲ | kragen 5 days ago | parent [-] | | That sounds like a good approach! Is it public? | | |
| ▲ | torginus 5 days ago | parent [-] | | Not yet unfortunately, there are a few thorny issues, and I want to get it into an actually usable state before I dare make any claims about it :) |
|
|
|
| |
| ▲ | thomasmg 5 days ago | parent | prev [-] | | > there are as many newly available address ranges as dead objects Well, when using a bitmap (as they seem to do in the article), then multiple subsequent dead objects are considered to be in the same range, because multiple subsequent bits in the bitmap have the value zero. There is no need to visit each zero bit in the bitmap separately. |
|
| |
| ▲ | thomasmg 5 days ago | parent | prev [-] | | If you want to use a standard malloc / free implementation (dlmalloc etc.) then dead object need to be known, yes. But the malloc library could be fully integrated into the GC mechanism. In this case, there is no need. That's probably much easier, and faster, because the malloc can be simplified to bumping a pointer. | | |
| ▲ | kragen 5 days ago | parent [-] | | That works if you use a copying garbage collector, but not a non-moving collector like FUGC. | | |
| ▲ | thomasmg 5 days ago | parent [-] | | OK, I did not read the source code of the FUGC, but the article mentions "FUGC relies on a sweeping algorithm based on bitvector SIMD." So, assuming there is just one bit per block of memory, then there is no need to visit the memory of the dead objects in the sweep phase. The bit of the dead object is zero, and so that memory block is considered free and available for reallocation. There is no need to visit the free block. | | |
| ▲ | kragen 5 days ago | parent [-] | | It's true that it doesn't dirty up your dcache, but it's "visiting" enough that it wouldn't "need a lot more special support if it wanted to report the dead objects," which is the context of the discussion here. |
|
|
|
|
|
|
|
|
| ▲ | kragen 5 days ago | parent | prev [-] |
| Yeah, I meant to be clear that 4× was the worst case, and I think it's an impressive achievement already, and perfectly fine for almost everything. After all, running single-threaded software on an 8-core CPU is already an 8× slowdown, right? And people do that all the time! What's the minimal code size overhead for FUGC? |
| |
| ▲ | pizlonator 5 days ago | parent [-] | | > What's the minimal code size overhead for FUGC? I don’t have good data on this. The FUGC requires the compiler to emit extra metadata and that metadata is hilariously inefficient right now. I haven’t bothered to optimize it. And the FUGC implementation pulls in all of libpas even though it almost certainly doesn’t have to. So I don’t know what minimal looks like right now | | |
| ▲ | gr4vityWall 5 days ago | parent | next [-] | | It's refreshing to see a honest, no-nonsense, not full of marketing-speak reply for once. Your replies on this thread have been great. :) | |
| ▲ | kragen 5 days ago | parent | prev [-] | | I see, thanks! |
|
|