Remix.run Logo
thorum 9 hours ago

Interesting read! Creating tests is highlighted as something Claude did well, but it strikes me that all the weaker rejected solutions could have been avoided if it were really good at designing intelligent tests for itself. For example, the first solution “was very specific to the reported bug and wouldn’t have fixed the general case” and the third suggestion “prevented the perfectly valid use of as conversion expressions in go commands as well”. I imagine both of these cases could have been noticed and avoided by the agent if it had planned out adequate tests ahead of time.

piskov 3 minutes ago | parent | next [-]

As a human you have a concept of viscosity. That resistance, like being in quicksand or a swamp, is how you “easily” identify a code smell, something that needs to be refactored, etc.

LLM being a tiresome little helper will gladly output hundreds of lines, hacks, and what have you.

I don’t think any amount of tests, prompts, harnesses and other “my shaman is a better shaman” will help it to acquire this trait.

And that’s why it is good at what it is and really bad at stuff like code “design” (unless it is a well-known solution being baked in the training set)

rapind 2 hours ago | parent | prev [-]

This is kind of what coding with LLMs feels like. Gradually increase guard rails "outside of it's context (automated)" to get the results you want out of it. Static typing, quick compilation, not having nulls, and lints are a great start (I would also argue for managed side effects and functional, but to each their own).

It gets pretty far to the solution on it's own and quickly, but then you spend time adjacent to the problem, building out it's cage while iterating through the remainder of the solution.