Everyone is comparing this to Playwright but it's solving a different problem. Playwright checks structural properties, like does element X exist, is it visible, etc. That's useful but it can't tell you whether the page actually looks right.

I built something similar that takes a screenshot and uses a multi-modal LLM to evaluate it against a design mock. It catches a completely different class of error. The DOM can be structurally perfect and still look nothing like what was intended. Colors wrong, layout shifted, spacing off, components overlapping. No amount of DOM assertions will catch that.

These are two different kinds of gates: structural which are fast and deterministic, and stochastic which are slow but catch things that are completely different. There is very little overlap between the issues, and you want to catch both.

That way I can invest a lot of time getting the mock just right, then let the agents "make it so".

▲

tptacek 4 hours ago | parent | next [-]

Playwright seems to do fine at visual stuff? It takes screenshots and the model evaluates them. That's most of what I use Playwright for.

▲

morkalork 6 hours ago | parent | prev [-]

Copilot + Playwright MCP can take screenshots and send the images to LLM tho?

	▲	mrothroc 6 hours ago \| parent [-]
		It's the whole tool that's important, not so much how you get screenshots. That's what I'm saying: this is headed in the right direction, it just falls a little short of what I do, where I get tons of value over and above just playwright (or whatever gets the screenshot). The critical part is that viewed at a high level, this method tests something different, which means it catches different errors.