Remix.run Logo
amluto 8 hours ago

$200? Does this use reasoning? Does it involve forgetting to use KV caching?

This should cost well under $1. Process the prompt. Then, for each word, input that word and then the end of prompt token, get your one token of output (maybe two if your favorite model wants to start with a start-of-reply token), and that’s it.