> There are often also invariants that can be used to identify bugs without ground truth, e.g rendering the page with slightly different widths you can make some assertions about how far elements will move.

That's really interesting and sounds useful! I'm wondering if there are general guidelines/requirements (not specific to browsers) that could kind of "trigger" those things in the agent, without explicitly telling it. I think generally that's how I try to approach prompting.

▲

hedgehog 8 hours ago | parent [-]

I think if you explain that general idea the models can figure it enough to write into an implementation plan, at least some of the time. Interesting problem though.

▲

embedding-shape 7 hours ago | parent [-]

> that general idea the models can figure it enough to write into an implementation plan

I'm not having much luck with it, they get lost in their own designs/architectures all the time, even the best models (as far as I've tested stuff). But as long as I drive the design, things don't end up in a ball of spaghetti immediately.

Still trying to figure out better ways of doing that, feels like we need to focus on tooling that lets us collaborate with LLMs better, rather than trying to replace things with LLMs.

	▲	hedgehog 6 hours ago \| parent [-]
		Yeah, from what I can tell a lot of design ability is somewhere in the weights but the models don't regurgitate it without some coaxing. It may be related to the pattern where after generating some code you can instruct a model review it for correctness and it can find and fix many issues. Regarding tooling, there's a major philosophical divide between LLM maximalists that prefer the model to drive the "agentic" outer loop and what I'll call "traditionalists" that prefer control be run by algorithms more related to classical AI research. My personal suspicion is the second branch is greatly under-exploited but time will tell.