| ▲ | prodigycorp 4 hours ago | ||||||||||||||||
Nah I remember how disgusted I felt trying llama 4 maverick and scout. They were both DOA.. couldn't even beat much smaller local models. | |||||||||||||||||
| ▲ | pixel_popping an hour ago | parent | next [-] | ||||||||||||||||
failing non-stop at tool calls on top of that. | |||||||||||||||||
| ▲ | refulgentis 4 hours ago | parent | prev [-] | ||||||||||||||||
I'll cosign what you said, simultaneously, yr interlocutor's point is also well-founded and it depresses me it's not better known and sounds so...off...due to conventional wisdom x God King Zuck's misunderstanding his own company and resulting overreaction. They beat Gemini 2.5 Flash and Pro handily on my benchmark suite. (tl;dr: tool calling and agentic coding). Llama 4 on Groq was ~GPT 4.1 on the benchmark at ~50% the cost. They shouldn't have released it on a Saturday. They should have spent a month with it in private prerelease, working with providers.[1] The rushed launch and ensuing quality issues got rolled into the hypebeast narrative of "DeepSeek will take over the world" I bet it was super fucking annoying to talk to due to LMArena maxxing. [1] my understanding is longest heads up was single-digit days, if any. Most modellers have arrived at 2+ weeks now, there's a lot between spitting out logits and parsing and delivering a response. | |||||||||||||||||
| |||||||||||||||||