| ▲ | gcgbarbosa 3 hours ago | |||||||||||||||||||||||||||||||||||||
"the intelligence is clearly there" I wonder if I am using the same models as everyone else. To me, LLMs still give good answers 80% of the time, but 20% it fails in such a miserable way that makes it obvious that the "intelligence" is not there. | ||||||||||||||||||||||||||||||||||||||
| ▲ | coldtea 2 hours ago | parent | next [-] | |||||||||||||||||||||||||||||||||||||
It might be extra demand for rigor that's not equally applied to humans. One could argue that other coders in our teams, or even ourselves, often fail in "a miserable way", say about 20% of the time. But we block this out, or consider it "regular functioning", or just a one-off based on something we got wrong, "just a try" we redo, etc. But when an LLM does it on an area we know, we notice and suddenly it's too much. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | 21asdffdsa12 3 hours ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
It really depends on the field you are in and the tasks you set and how much of it was in the training set? A webdeveloper will find it succeeding in all taks - while some c++ exotic physics simulation developer will find it lacking. The "works for me" is telling more about the field of the LLM reviewer, then the LLM. | ||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||
| ▲ | hodgehog11 an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
I get about the same success rate with my problems (scientific computing usually), but they're often _much_ easier to check than to write, so an 80% success rate becomes game-changing. | ||||||||||||||||||||||||||||||||||||||
| ▲ | an hour ago | parent | prev | next [-] | |||||||||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||||||||
| ▲ | an hour ago | parent | prev [-] | |||||||||||||||||||||||||||||||||||||
| [deleted] | ||||||||||||||||||||||||||||||||||||||