|
| ▲ | pron 8 hours ago | parent | next [-] |
| You mean trying and failing to build a C compiler. This isn't a very hard task to begin with (assuming you know compilers, and the models do), but it was made unrealistically easy by giving the agents thousands of tests written by humans over years (on top of a spec and a reference implementation, both of which the models were trained on), and the agents still failed to converge. I was actually surprised that they failed as this was the purest possible example of "just do the coding" (something that isn't achievable in real or more complex cases) and when I read the description I thought they made it too easy, and in a way that isn't representative of real software. My thought at that failure was that if agents can't even build a C compiler with so much preparation effort put into the test, then we have some ways to go. Indeed, once you work a lot with agents for a while you see that coding isn't really their strong suit (although they are impressive at debugging). |
|
| ▲ | AlienRobot 8 hours ago | parent | prev | next [-] |
| How many C compilers do we need... |
| |
|
| ▲ | refulgentis 8 hours ago | parent | prev | next [-] |
| Right. Pretty impressive. What percentage of people will think that’s life changing? Because then we’re not talking about “can everyone up their demos to life changing, please?”, we’re talking about “can everyone use demos Oarch thinks are life changing, please?” - and “can build a MVP C compiler draft that barely works for $XXK” isn’t really that compelling to me, and we’re both software engineers, and my whole day job has been an agentic coder for…2.5 years?…now. My incentive structure and demographics are lined up perfectly to agree with you, but I don’t :/ |
| |
| ▲ | Oarch 8 hours ago | parent [-] | | I'm still sure we can do a little better though. Maybe a personalised diet and exercise plan based on a huge range of information: preferences, biometrics, habit forming, disposable income, your local area etc | | |
| ▲ | greedo 7 hours ago | parent | next [-] | | Like putting glue on your pizza? | |
| ▲ | refulgentis 8 hours ago | parent | prev [-] | | This is an excellent point and reminds me that, in some ways, the agentic coding stuff and ability for RL to hill climb on that and improve models quickly, has distracted from prompt engineering / putting more effort into getting data to them as a user. |
|
|
|
| ▲ | queenkjuul 7 hours ago | parent | prev [-] |
| You're too easily impressed |