Remix.run Logo
elevation 11 hours ago

As LLMs consume all our compute resources and drive up prices for the compute hardware on which we run applications, the silver lining is that LLMs are helpful in implementing tooling without a heavy stack so it will run quickly on a lower-spec computer.

I've achieved 3 and 4 orders of magnitude CPU performance boosts and 50% RAM reductions using C in places I wouldn't normally and by selecting/designing efficient data structures. TUIs are a good example of this trend. For internal engineering, to be able to present the information we need while bypassing the millions of SLoC in the webstack is more efficient in almost every regard.

logicprog 9 hours ago | parent | next [-]

I suspect that a native GUI, or even something like GPUI or Flutter would be still more performant than TUI's, which are bound by the limitations of emulating terminals.

sunshowers 9 hours ago | parent | next [-]

A very important thing about constraints is that they also liberate. TUIs automatically work over ssh, can be suspended with ctrl-z and such, and the keyboard focus has resulted in helpful conventions like ctrl-R that tend to not be as prominent in GUIs.

normie3000 7 hours ago | parent [-]

What does ctrl-R do?

sunshowers 6 hours ago | parent [-]

History search, like in shells. My most used TUI shortcut!

coldtea 8 hours ago | parent | prev [-]

>would be still more performant than TUI's, which are bound by the limitations of emulating terminals.

That's what makes them great. As opposed to modern "minimal" waste of space UIs or the Electron crappage.

mseepgood 11 hours ago | parent | prev | next [-]

The question is how many decades each user of your software would have to use it in order to offset, through the optimisation it provides, the energy consumption you burned through with LLMs.

elevation 11 hours ago | parent | next [-]

When global supply chains are disrupted again, energy and/or compute costs will skyrocket, meaning your org may be forced to defer hardware upgrades and LLMs may no longer be cost effective (as over-leveraged AI companies attempt to recover their investment with less hardware than they'd planned.) Once this happens, it may be too late to draw on LLMs to quickly refactor your code.

If your business requirements are stable and you have a good test suite, you're living in a golden age for leveraging your current access to LLMs to reduce your future operational costs.

coldtea 7 hours ago | parent | prev | next [-]

Would it be that many? Asked AI to do some rough calculation, and it spit that:

Making 50 SOTA AI requests per day ≈ running a 10W LED bulb for about 2.5 hours per day

Given I usually have 2-3 lights on all day in the house, that's like 1500 LLM requests per day (which sounds quite more than I do).

So even a month worth of requests for building some software doesn't sound that much. Having a local beefy traditional build server compling or running tests for 4 hours a day would be like ~7,600 requests/day

shakna 3 hours ago | parent | next [-]

> Making 50 SOTA AI requests per day ≈ running a 10W LED bulb for about 2.5 hours per day

This seems remarkably far from what we know. I mean, just to run the data centre aircon will be an order of magnitude greater than that.

anonzzzies 5 hours ago | parent | prev [-]

Is that true? Because that's indeed FAR less than I thought. That would definitely make me worry a lot less about energy consumption (not that I would go and consume more but not feeling guilty I guess).

derekdahmer 2 hours ago | parent [-]

A H100 uses about 1000W including networking gear and can generate 80-150 t/s for a 70B model like llama.

So back of the napkin, for a decently sized 1000 token response you’re talking about 8s/3600s*1000 = 2wh which even in California is about $0.001 of electricity.

pshc 2 hours ago | parent [-]

With batched parallel requests this scales down further. Even a MacBook M3 on battery power can do inference quickly and efficiently. Large scale training is the power hog.

grogenaut 11 hours ago | parent | prev | next [-]

In the past week I made 4 different tasks that were going to make my m4brun at full tilt for a week optimized down to 20 minutes with just a few prompts. So more like an hour to pay off not decades. average claude invocation is .3 wh. m4 usez 40-60 watts, so 24x7x40 >> .3 * 10

embedding-shape 11 hours ago | parent | prev | next [-]

Especially considering that suddenly everyone and their mother create their own software with LLMs instead of using almost-perfect-but-slighty-non-ideal software others written before.

mulmen 11 hours ago | parent | prev [-]

I’m not really worried about energy consumption. We have more energy falling out of the sky than we could ever need. I’m much more interested in saving human time so we can focus on bigger problems, like using that free energy instead of killing ourselves extracting and burning limited resources.

teaearlgraycold 10 hours ago | parent | prev [-]

They’re also great for reducing dependencies. What used to be a new dependency and 100 sub-dependencies from npm can now be 200 lines of 0-import JS.