The whole UI seems better for LLMs to consume and also displays nicely in-editor for humans. Test failures become failing screenshot tests essentially, which are really comfortable changes to review.