Remix.run Logo
taeric 12 hours ago

I'm curious on the uptake of SIMD and other assembly level usage through high level code? I'd assume most is done either by people writing very low level code that directly manages the data, or by using very high level libraries that are prescriptive on what data they work with?

How many people are writing somewhat bog standard RUST/C and expect optimal assembly to be created?

jacquesm 7 hours ago | parent | next [-]

I was heavily into assembly before I discovered C. For the first decade and half or so I could usually beat the compiler. Since then, especially when supporting multiple architectures I have not been able to do that unless I knew some assumption that the compiler was likely to make wasn't true. The 'const' keyword alone killed most of my hand optimized stuff.

In the end the only bits where I resorted to assembly were the ones where it wouldn't make any sense to write stuff in C. Bootloaders, for instance, when all you have to work with is 512 bytes the space/speed constraints are much more on the space side and that's where I find I still have a (slight) edge. Which I guess means that 'optimal' is context dependent and that the typical 'optimal' defaults to 'speed'.

taeric 5 hours ago | parent [-]

I think this is talking past my question? I don't necessarily think "low level" has to be "writing assembly." I do think it means, "knows the full layout of the data in memory." Something a surprising number of developers do not know.

I've literally had debates with people that thought a CSV file would be smaller than holding the same data in memory. Senior level developers at both startups and established companies. My hunch is they had only ever done this using object oriented modeling of the data. Worse, usually in something like python, where everything is default boxed to hell and back.

jacquesm 5 hours ago | parent [-]

I was really only responding to this part, apologies for not quoting it:

> How many people are writing somewhat bog standard RUST/C and expect optimal assembly to be created?

As for:

> I don't necessarily think "low level" has to be "writing assembly." I do think it means, "knows the full layout of the data in memory." Something a surprising number of developers do not know.

Agreed. But then again, there are ton of things a surprising number of developers don't know, this is just another one of those.

Similar stuff:

- computers always work

- the CPU is executing my instructions one after the other in the same order in which they are written on the line(s)

- the CPU is executing my instructions only one at the time

- if I have a switch (or whatever that construct is called in $language) I don't need to check for values I do not expect because that will never happen

- the data I just read in is as I expect it to be

You can probably extend that list forever.

Your CSV example is an interesting one, I can think of cases where both could be true, depending on the kind of character encoding used and the way the language would deal with such a character encoding. For instance in a language where upon reading that file all of the data would be turned into UTF-16 then, indeed, the in memory representation of a plain ASCII CSV file could well be larger than the input file. Conversely, if the file contained newlines and carriage returns then the in-memory representation could omit the CRs and then the in memory representation would be smaller. If you turn the whole thing into a data structure then it could be larger, or smaller, depending on how clever the data structure was and whether or not the representation would efficiently encode the values in the CSV.

> My hunch is they had only ever done this using object oriented modeling of the data.

Yes, that would be my guess as well.

> Worse, usually in something like python, where everything is default boxed to hell and back.

And you often have multiple representations of the same data because not every library uses the same conventions.

zamadatix 11 hours ago | parent | prev [-]

It's really only comparable to assembly level usage in the SIMD intrinsics style cases. Portable SIMD, like std::simd, is no more assembly level usage than calling math functions from the standard library.

Usually one only bothers with the intrinsic level stuff for the use cases you're saying. E.g. video encoders/decoders needing hyper-optimized, per architecture loops for the heavy lifting where relying on the high level SIMD abstractions can leave cycles on the table over directly targeting specific architectures. If you're just processing a lot of data in bulk with no real time requirements, high level portable SIMD is usually more than good enough.

taeric 11 hours ago | parent [-]

My understanding was that the difficulty with the intrinsics was more in how restrictive they are in what data they take in. That is, if you are trying to be very controlling of the SIMD instructions getting used, you have backed yourself into caring about the data that the CPU directly understands.

To that end, even "calling math functions" is something that a surprising number of developers don't do. Certainly not with the standard high level data types that people often try to write their software into. No?

zamadatix 9 hours ago | parent [-]

More than that: many of the intrinsics can be unsafe in standard Rust. This situation got much better this year but it's still not perfect. Portable SIMD has always been safe, because they are just normal high level interfaces. The other half is intrinsics are specific to the arch. Not only do you need to make sure the CPUs support the type of operation you want to do, but you need to redo all of the work to e.g. compile to ARM for newer MacBooks (even if they support similar operations). This is also not a problem using portable SIMD, the compiler will figure out how to map the lanes to each target architecture. The compiler will even take portable SIMD and compile it for a scalar target for you, so you don't have to maintain a SIMD vs non-SIMD path.

By "calling math functions" I mean things like:

  let x = 5.0f64;
  let result = x.sqrt()
Where most CPUs have a sqrt instruction but the program will automatically compile with a (good) software substitution for targets that don't. It's very similar with portable SIMD - the high level call gets mapped to whatever the target best supports automatically. Neither SIMD nor these kind of math functions work automatically with custom high level data types. The only way to play for those is to write the object to have custom methods which break it down to the basic types so the compiler knows what you want the complex type's behavior to be. If you can't code that then there isn't much you can do with the object, regardless of SIMD. With intrinsics you need to go a step further beyond all that and directly tell the compiler what specific CPU instructions should be used for each step (and make sure that is done safely, for the remaining unsafe operations).
taeric 5 hours ago | parent [-]

I knew what you meant. My point was more that most people are writing software at the level of "if (overlaps(a, b)) doSomething()" Yes, there will be plenty of math and intrinsics in the "overlaps" after you get through all of the accessors necessary to have the raw numbers. But especially in heavily modeled spaces, the number one killer of getting to the SIMD is that the data just isn't in a friendly layout for it.

Is that not the case?