| ▲ | Havoc 2 hours ago | |||||||
Are language models really the best choice for this? Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as > We also found that the models were highly sensitive to seemingly trivial prompt changes | ||||||||
| ▲ | kqr 2 hours ago | parent | next [-] | |||||||
No, LLMs are not a good choice for this – as the results show! If I had to guess, they're experimenting with LLMs for publicity. | ||||||||
| ▲ | baq 2 hours ago | parent | prev [-] | |||||||
they're tools. treat them as tools. since they're so general, you need to explore if and how you can use them in your domain. guessing 'they're poorly suited' is just that, guessing. in particular: > We also found that the models were highly sensitive to seemingly trivial prompt changes this is as much as obvious for anyone who seriously looked at deploying these, that's why there are some very successful startups in the evals space. | ||||||||
| ||||||||