Remix clone Hacker News

new | show | ask | jobs Github

▲

Havoc 2 hours ago

Are language models really the best choice for this?

Seems to me that the outcome would be near random because they are so poorly suited. Which might manifest as

> We also found that the models were highly sensitive to seemingly trivial prompt changes

▲

kqr 2 hours ago | parent | next [-]

No, LLMs are not a good choice for this – as the results show! If I had to guess, they're experimenting with LLMs for publicity.

▲

baq 2 hours ago | parent | prev [-]

they're tools. treat them as tools.

since they're so general, you need to explore if and how you can use them in your domain. guessing 'they're poorly suited' is just that, guessing. in particular:

> We also found that the models were highly sensitive to seemingly trivial prompt changes

this is as much as obvious for anyone who seriously looked at deploying these, that's why there are some very successful startups in the evals space.

	▲	rob_c 2 hours ago \| parent [-]
		> guessing 'they're poorly suited' is just that, guessing I have a really nice bridge to sell you... This "failure" is just a grab at trying to look "cool" and "innovative" I'd bet. Anyone with a modicum of understanding of the tooling (or hell experience they've been around for a few years now, enough for people to build a feeling for this), knows that this it's not a task for a pre-trained general LLM.