| ▲ | xienze 2 hours ago | |||||||
I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated. | ||||||||
| ▲ | fgfarben an hour ago | parent | next [-] | |||||||
That prefill number isn't right. M4 Max hits 200-300: https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_... | ||||||||
| ▲ | aiscoming 2 hours ago | parent | prev [-] | |||||||
if it's just the coding agent system prompt and tools, you can cache that | ||||||||
| ||||||||