▲ | mewpmewp2 10 days ago | |
That's a really good point. I was wondering why some of the LLMs were trained to try to pass things so sloppily constantly. Writing mock data, methods and pretending as if the task is complete and everything is great, good to go. They do seem to be trained just to pass some sort of conditions sadly and it feels somehow to me that it has got worse as of late. It should be relatively easy to reward them for writing robust code even if it takes longer or won't work, but it does seem they are geared towards getting high swe benchmarks. |