Remix.run Logo
tl2do 15 hours ago

Genuine question: what kinds of workloads benefit most from this speed? In my coding use, I still hit limitations even with stronger models, so I'm interested in where a much faster model changes the outcome rather than just reducing latency.

layoric 14 hours ago | parent | next [-]

I think it would assist in exploiting exploring multiple solution spaces in parallel, and can see with the right user in the loop + tools like compilers, static analysis, tests, etc wrapped harness, be able to iterate very quickly on multiple solutions. An example might be, "I need to optimize this SQL query" pointed to a locally running postgres. Multiple changes could be tested, combined, and explain plan to validate performance vs a test for correct results. Then only valid solutions could be presented to developer for review. I don't personally care about the models 'opinion' or recommendations, using them for architectural choices IMO is a flawed use as a coding tool.

It doesn't change the fact that the most important thing is verification/validation of their output either from tools, developer reviewing/making decisions. But even if don't want that approach, diffusion models are just a lot more efficient it seems. I'm interested to see if they are just a better match common developer tasks to assist with validation/verification systems, not just writing (likely wrong) code faster.

storus 4 hours ago | parent | prev | next [-]

I'd say using them as draft models for some strong AR model, speeding it up 3x. Diffusion generates a bunch of tokens extremely fast, those can be then passed over to an AR model to accept/reject instead of generating them.

cjbarber 14 hours ago | parent | prev | next [-]

I've tried a few computer use and browser use tools and they feel relatively tok/s bottlenecked.

And in some sense, all of my claude code usage feels tok/s bottlenecked. There's never really a time where I'm glad to wait for the tokens, I'd always prefer faster.

volodia 13 hours ago | parent | prev | next [-]

There are few: fast agents, deep research, real-time voice, coding. The other thing is that when you have a fast reasoning model, you spend more effort on thinking in the same latency budget, which pushed up quality.

irthomasthomas 15 hours ago | parent | prev | next [-]

multi-model arbitration, synthesis, parallel reasoning etc. Judging large models with small models is quite effective.

corysama 12 hours ago | parent | prev | next [-]

Coding auto-complete?

quotemstr 13 hours ago | parent | prev [-]

Once you make a model fast and small enough, it starts to become practical to use LLMs for things as mundane as spell checking, touchscreen-keyboard tap disambiguation, and database query planning. If the fast, small model is multimodal, use it in a microwave to make a better DWIM auto-cook.

Hell, want to do syntax highlighting? Just throw buffer text into an ultra-fast LLM.

It's easy to overlook how many small day-to-day heuristic schemes can be replaced with AI. It's almost embarrassing to think about all the totally mundane uses to which we can put fast, modest intelligence.