If you don't want this to break eventually, you need it tested every time your CI/CD test suite runs. Manual testing just doesn't cut it

▲

tdeck 3 days ago | parent | next [-]

We have the exact same problem with visual interfaces, and the combination of manual testing for major changes + deterministic UI testing works pretty well.

Actually it could be even easier to write tests for the screen reader workflow, since the interactions are all text I/O and pushing keys.

▲

cenamus 3 days ago | parent | prev [-]

AI in your CI pipeline won't help either then, if it randomly gives different answers

▲

simonw 3 days ago | parent | next [-]

An AI-generated automated testing script in your pipeline will do great though.

▲

debugnik 3 days ago | parent [-]

And then we're back at your own:

> I'm not convinced at all by most of the heuristic-driven ARIA scanning tools.

▲

simonw 3 days ago | parent [-]

That's entirely different.

ARIA scanning tools are things that throw an error if they see an element that's missing an attribute, without even attempting to invoke a real screenreader.

I'm arguing for automated testing scripts that use tools like Guidepup to launch a real screenreader and assert things like the new content that was added by fetch() being read out to the user after the form submission has completed.

I want LLMs and coding agents to help me write those scripts, so I can run them in CI along with the rest of my automated tests.

	▲	debugnik 3 days ago \| parent [-]
		That's very different from what I thought you were arguing for in your top comment, though: a computer-use agent proving the app is usable through a screen reader alone (and hopefully caching a replayable trace to not prompt it on every run). Guidepup already exists, if people cared they'd use it for tests with or without LLMs. Thanks for showing me this tool BTW! I agree testing against real readers is better than using a third-party's heuristics.

▲

zamadatix 3 days ago | parent | prev [-]

So does hiring a person or tests which rely on entropy because exhaustive testing is infeasible. If you can wrangle the randomness (each has different ways of going about that) then you end up with very useful tests in all 3 scenarios, but only automated tests scale to running every commit. You probably still want the non-automated tests per release or something as well if you can, depending what you're doing, but you don't necessarily want only invariant tests in either case.