| ▲ | toss1 a day ago | |||||||
Indeed they are not forced to train them on user outputs, but the author of the article seems to have found good evidence that they are actually doing that, and will need more expert data-tagging/filtering on the inputs to regain their previous performance | ||||||||
| ▲ | Zababa 19 hours ago | parent [-] | |||||||
I don't think the author of the article found "good evidence". He found a specific case where there was a regression. This could be due to: - models actually getting worse in general - his specific style of prompting working well with older models and less well with newer models - the thing his test tests no longer being a priority for big AI labs From the article: > GPT-4 gave a useful answer every one of the 10 times that I ran it. In three cases, it ignored my instructions to return only code, and explained that the column was likely missing from my dataset, and that I would have to address it there. Here ignoring the instructions to give a "useful answer" (as evaluated by the author) is considered a good thing. This would mean if a model is trained to be better at instruction following, it would lose points in that test. To me this article feels a bit like saying "this new gun that shoot straight 100% of the time is worse than the older gun that shot straight only 50% of the time, because sometimes I shoot at something I don't actually want to shoot at!". And in a way, it is true, if you're used to being able to shoot at things without them getting hurt, the new gun will be worse from that point of view. But to spin up a whole theory about garbage in/garbage out from that? Or to think all models are getting worse rather than, you're maybe no longer the target audience? That seems weird to me. | ||||||||
| ||||||||