Remix.run Logo
cogman10 4 hours ago

> Numeric characteristics are absolutely still a consideration for game designers even in 2026, one that influences what numbers they use in their game designs. The good ones, anyways.

I used to think like this, not anymore.

What convinced me that these sort of micro-optimizations just don't matter is reading up on the cycle count of modern processors.

One a Zen 5, Integer addition is a single cycle, multiplication 3, and division ~12. But that's not the full story. The CPU can have 5 inflight multiplications running simultaneously. It can have about 3 divisions running simultaneously.

Back in the day of RCT, there was much less pipelining. For the original pentium, a multiplication took 11 cycles, division could take upwards of 46 cycles. These were on CPUs with 100 Mhz clock cycles. So not only did it take more cycles to finish, couldn't be pipelined, the CPUs were also operating at 1/30th to 1/50th the cycle rate of common CPUs today.

And this isn't even touching on SIMD instructions.

Integer tricks and optimizations are pointless. Far more important than those in a modern game is memory layout. That's where the CPU is actually going to be burning most it's time. If you can create and do operations on a int[], you'll be MUCH faster than if you are doing operations against a Monster[]. A cache miss is going to mean anywhere from a 100 to 1000 cycle penalty. That blows out any sort of hit you take cutting your cycles from 3 to 1.

moregrist an hour ago | parent | next [-]

> Integer tricks and optimizations are pointless.

They’re not pointless; they’re just not the first thing to optimize.

It’s like worrying about cache locality when you have an inherently O(n^2) algorithm and could have a O(n log n) or O(n) one. Fix the biggest problem first.

Once your data layout is good and your cpu isn’t taking a 200 cycle lunch break to chase pointers, then you worry about cycle count and keeping the execution units fed.

That’s when integer tricks can matter. Depending on the micro arch, you may have twice as many execution units that can take integer instructions. And those instructions (outside of division) tend to have lower latency and higher throughput.

And if you’re doing SIMD, your integer SIMD instructions can be 2 or 4x higher throughput than float32 if you can use int16 / int8 data.

So it can very much matter. It’s just usually not the lowest hanging fruit.

Pannoniae 4 hours ago | parent | prev | next [-]

This is all true but IMO forest for the trees.... For example the compiler basically doesn't do anything useful with your float math unless you enable fastmath. Period. Very few transformations are done automatically there.

For integers the situation is better but even there, it hugely depends on your compiler and how much it cheats. You can't replace trig with intrinsics in the general case (sets errno for example), inlining is at best an adequate heuristic which completely fails to take account what the hot path is unless you use PGO and keep it up to date.

I've managed to improve a game's worst case performance better by like 50% just by shrinking a method's codesize from 3000 bytes to 1500. Barely even touched the hot path there, keep in mind. Mostly due to icache usage.

The takeaway from this shouldn't be that "computers are fast and compilers are clever, no point optimising" but more that "you can afford not to optimise in many cases, computers are fast."

cogman10 3 hours ago | parent | next [-]

I actually agree with you.

My point wasn't "don't optimize" it was "don't optimize the wrong thing".

Trying to replace a division with a bit shift is an example of worrying about the wrong thing, especially since that's a simple optimization the compiler can pick up on.

But as you said, it can be very worth it to optimize around things like the icache. Shrinking and aligning a hot loop can ensure your code isn't spending a bunch of time loading instructions. Cache behavior, in general, is probably the most important thing you can optimize. It's also the thing that can often make it hard to know if you actually optimized something. Changing the size of code can change cache behavior, which might give you the mistaken impression that the code change was what made things faster when in reality it was simply an effect of the code shifting.

WalterBright 3 hours ago | parent | prev [-]

I originally got into writing compilers because I was convinced I could write a better code generator. I succeeded for about 10 years in doing very well with code generation. But then all the complexities of the evolving C++ (and D!) took up most of my time, and I haven't been able to work much on the optimizer since.

Fortunately, D compilers gdc and ldc take advantage of the gcc and llvm optimizers to stay even with everyone else.

rcxdude 3 hours ago | parent | prev [-]

Also, it's unusual for a game to be CPU bottlenecked nowadays, and if it is, it's probably more constrained on memory bandwidth than raw FLOPS.