Remix.run Logo
tombert 13 hours ago

I don't know a ton about Swift, but it does feel like for a lot of apps (especially outside of the gaming and video encoding world), you can almost treat CPU power as infinite and exclusively focus on reducing latency.

Obviously I'm not saying you throw out big O notation or stop benchmarking, but it does seem like eliminating an extra network call from your pipeline is likely to have a much higher ROI than nearly any amount of CPU optimization has; people forget how unbelievably slow the network actually is compared to CPU cache and even system memory. I think the advent of these async-first frameworks and languages like Node.js and Vert.x and Tokio is sort of the industry acknowledgement of this.

We all learn all these fun CPU optimization tricks in school, and it's all for not because anything we do in CPU land is probably going to be undone by a lazy engineer making superfluous calls to postgres.

Yoric 12 hours ago | parent | next [-]

The answer to that would very much be: "it depends".

Yes, of course, network I/O > local I/O > most things you'll do on your CPU. But regardless, the answer is always to measure performance (through benchmarking or telemetry), find your bottlenecks, then act upon them.

I recall a case in Firefox in which we were bitten by a O(n^2) algorithm running at startup, where n was the number of tabs to restore, another in which several threads were fighting each other to load components of Firefox and ended up hammering the I/O subsystem, but also cases of executable too large, data not fitting in the CPU cache, Windows requiring a disk access to normalize paths, etc.

tombert 11 hours ago | parent [-]

Sure, I will admit I was a bit hyperbolic here.

Obviously sometimes you need to do a CPU optimization, and I certainly do not think you should ignore big O for anything.

It just feels like 90+% of the time my “optimizing” boils down to figuring out how to batch a SQL or reduce a call to Redis or something.

postsantum 12 hours ago | parent | prev | next [-]

I worked on a resource-intensive android app for some years and it had a good perfomannce boost after implementing parallelization. But mostly for old shitty devices

On latest phones it's barely noticiable

wat10000 11 hours ago | parent | prev [-]

Some of this is because you’re leaning on the system to be fast. A simple async call does a lot of stuff for you. If it was implemented by people who treated CPU power as if it was infinite, it would slow you down a lot. Since it was carefully built to be fast, you can write your stuff in a straightforward manner. (This isn’t a criticism. I work in lower levels of the stack, and I consider a big part of the job to be making it so people working higher up have to think about this stuff as little as possible. I solve these problems so they can solve the user’s problem.)

It’s also very context dependent. If your code is on the critical path for animations, it’s not too hard to be too slow. Especially since standards are higher. You’re now expected to draw a frame in 8ms on many devices. You could write some straightforward code that decodes JSON to extract some base64 to decompress a zip to retrieve a JPEG and completely blow out your 8ms if you manage to forget about caching that and end up doing it every frame.

tombert 7 hours ago | parent [-]

Yeah, fair. I never found poll/select/epoll or the Java NIO Selector to be terribly hard to use, but even those are fairly high-level compared to how these things are implemented in the kernel.

wat10000 7 hours ago | parent [-]

Right, and consider how many transformations happen to the data between the network call and the screen. In a modern app it's likely coming in as raw bytes, going through a JSON decoder (possibly with a detour through a native string type), likely getting marshaled into hash tables and arrays before being shoved into more specific model types, then pass that data along to a fully Unicode-aware text renderer that does high quality vector graphics.... There's a lot in there that could be incredibly slow. But since it's not, we can write a few lines of code to make all of this happen and not worry about optimization.