Remix.run Logo
amluto 2 hours ago

That blog post really makes it look like it's graded from an LLM's estimation of an OSS maintainer's review. I see three issues:

1. That estimate could easily be wrong.

2. That estimate is, of course, usable in RL training. This isn't an inherently bad thing, and this is more or less what has improved coding models so much lately. But it does mean that other companies could and surely will do this sort of training, and Anthropic probably did too.

3. OSS maintainers are far from perfect, and there's an unfortunate uncanny valley-like effect in which a coding model can produce code that is just convincing enough to pass review even though it's actually totally wrong. I don't know whether this is a specific issue here.