▲ | ktzar 3 days ago | |
Also, the way models are evolving (thinking process, llms waiting for interactions with external entities via MCP, mixture of experts, ...) are making "useful chatbot responses" way way way more expensive than they used to be when you were pretty much hitting an autocomplete. To a level where these are starting to be prohibitive to run locally at a decent tokens/s speed, and we're being tied to using their models. |