▲ | edent 11 hours ago | ||||||||||||||||
OP here. I literally opened up Gemini and used the defaults. If the defaults are shit, maybe don't offer them as the default? Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?" Either way, disappointing. | |||||||||||||||||
▲ | magicalhippo 11 hours ago | parent | next [-] | ||||||||||||||||
> Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?" That is indeed an area where LLMs don't shine. That is, not only are they trained to always respond with an answer, they have no ability to accurately tell how confident they are in that answer. So you can't just filter out low confidence answers. | |||||||||||||||||
| |||||||||||||||||
▲ | hobofan 11 hours ago | parent | prev | next [-] | ||||||||||||||||
Then criticize the providers on their defaults instead of claiming that they can't solve the problem? > Or, if LLMs are so smart, why doesn't it say "Hmmm, would you like to use a different model for this?" That's literally what ChatGPT did for me[0], which is consistent from what they shared at the last keynote (quick-low reasoning answer per default first, with reasoning/search only if explicitly prompted or as a follow-up). It did miss one match tough, as it somehow didn't parse the `<search>` element from the MDN docs. [0]: https://chatgpt.com/share/68cffb5c-fd14-8005-b175-ab77d1bf58... | |||||||||||||||||
▲ | pwnOrbitals 11 hours ago | parent | prev | next [-] | ||||||||||||||||
You are pointing out a maturity issue, not a capability problem. It's clear to everyone that LLM products are immature, but saying they are incapable is misleading | |||||||||||||||||
| |||||||||||||||||
▲ | maddmann 11 hours ago | parent | prev [-] | ||||||||||||||||
“Defaults are shit” — is that really true though?! Just because it shits the bed on some tasks does not mean it is shit. For people integrating llms into any workflow that requires a modicum of precision or determinism, one must always evaluate output closely/have benchmarks. You must treat the llm as an incompetent but overconfident intern, and thus have fast mechanisms for measuring output and giving feedback. |