▲ | SparkyMcUnicorn 21 hours ago | |
This is pretty much exactly what I was going to build. It's missing a few things, so I'll either be contributing or forking this in the future. I'll need a way to extract data as part of the tests, like screenshots and page content. This will allow supplementing the tests with non-magnitude features, as well as add things that are a bit more deterministic. Assert that the added todo item exactly matches what was used as input data, screenshot diffs when the planner fallback came into play, execution log data, etc. This isn't currently possible from what I can see in the docs, but maybe I'm wrong? It'd also be ideal if it had an LLM-free executor mode to reduce costs and increase speed (caching outputs, or maybe use accessibility tree instead of VLM), and also fit requirements when the planner should not automatically kick in. | ||
▲ | anerli 20 hours ago | parent [-] | |
Hey, awesome to hear! We are definitely open to contributions :) We plan to (very soon) enable mixing standard Playwright or other code in between Magnitude steps, which should enable doing exact assertions or anything else you want to do. Definitely understand the need to reduce costs / increase speed, which mainly we think will be best enabled by our plan-caching system that will get executed by Moondream (a 2B model). Moondream is very fast and also has self-hosted options. However there's no reason we couldn't potentially have an option to generate pure Playwright for people who would prefer to do that instead. We have a discord as well if you'd like to easily stay in touch about contributing: https://discord.gg/VcdpMh9tTy |