| ▲ | fragmede an hour ago | |
The "traditional" way we vibe code is human software developer prompts AI -> AI generates code -> (human checks code) -> code gets compiled/deployed/etx -> users use "binary". At the speed of 1000 tok/sec, user prompts obliquely -> AI vets generated code -> code deployed -> user gets response from deployed code. It's a cute toy right now, but you can tell an LLM that it's an http server, and have it respond directly to a web browser hitting it. It generates headers in response, as well as page contents. As 1000 tok/sec becomes three new normal, we will come up with newer ways to use it outside of toy fiction encyclopedias. | ||
| ▲ | HarHarVeryFunny an hour ago | parent [-] | |
1000 tokens per sec is still massively slower than serving a normal web page - if something doesn't respond in a few seconds many people give up. I'm not saying there aren't any use cases for super-fast (and super-expensive) generation, but it does seem a bit niche. If it was free then sure faster is better, but what are the mainstream use cases where people might pay 3x more for a faster version of something that is already fast? I think it would have to be an application where it paid for itself - where the 10x faster response was actually worth more than 3x the cost to you - where the extra speed was worth the extra cost. | ||