Remix.run Logo
pizlonator 5 days ago

For starters, llvm is a lot less willing to exploit that UB

It’s also weird that GCC gets away with this at all as many C programs in Linux that compile with GCC make deliberate use of out of bounds pointers.

But yeah, if you look at my patch to llvm, you’ll find that:

- I run a highly curated opt pipeline before instrumentation happens.

- FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.

- I made some surgical changes to clang CodeGen and some llvm passes to fix some obvious issues from UB

But also let’s consider what would happen if I hadn’t done any of that except for dropping UB flags in FilPizlonator. In that case, a pass before pizlonation would have done some optimization. At worst, that optimization would be a logic error or it would induce a Fil-C panic. FilPizlonator strongly limits UB to its “memory safe subset” by construction.

I call this the GIMSO property (garbage in, memory safety out).

kartoffelsaft 5 days ago | parent | next [-]

Not knowing the exact language used by the C standard, I suspect the reason GCC doesn't cause these issues with most programs is that the wording of "array object" refers specifically to arrays with compile-time-known sizes, i.e. `int arr[4]`. Most programs that do out of bounds pointer arithmetic are doing so with pointers from malloc/mmap/similar, which might have similar semantics to arrays but are not arrays.

pizlonator 5 days ago | parent [-]

Yes, I think you're right

AlotOfReading 5 days ago | parent | prev [-]

    FilPizlonator drops flags in LLVM IR that would have permitted downstream passes to perform UB driven optimizations.
Does this work reliably or did your patches have to fix bugs here? There are LLVM bugs with floating point where backend doesn't properly respect passed attributes during codegen, which violate the behaviors of user level flags. I imagine the same thing exists for UB.
pizlonator 5 days ago | parent [-]

It works reliably.

LLVM is engineered to be usable as a backend for type-safe/memory-safe languages. And those flags are engineered to work right for implementing the semantics of those languages, provided that you also do the work to avoid other LLVM pitfalls (and FilPizlonator does that work by inserting aggressive checks).

Of course there could be a bug though. I just haven't encountered this particular kind of bug, and I've tested a lot of software (see https://fil-c.org/programs_that_work)