Remix.run Logo
kirtivr 5 hours ago

I can think of real time video, shader generation, real time worldbuilding type problems could require such a high token throughput.

For instant code generatio, 400-500 tok/s should be sufficient, though most frontier models give us closer to 70 tok/s.

Gomotono 4 hours ago | parent [-]

That sounds a little bit like the 64kb memory is enough, then someone invented electron ;P

But joke aside, I think we don't even know yet what is possible if you hit very fast very high token / second numbers if your whole ecosystem behind it can handle it.

You could literaly implement the same solution 100x and benchmark all of them and get only the best result.

You could build and architecture a whole stack in parallel.

You could do massive thinking token / chain of thought.

You could let the LLM analyse everything around you while you type. Like it could tell you that this might create a bug in a different file and why.

We could start doing some type of monte-carlo search with this.