▲ | skull8888888 7 hours ago | |
isn't browser use sota on web voyager? At this point web voyager seems to be outdated, there's def a need for a new harder benchmark. | ||
▲ | MagMueller 5 hours ago | parent | next [-] | |
Yes, this is the report of browser-use with 89%: https://browser-use.com/posts/sota-technical-report We definitely need a new dataset with more complex tasks, like uploading files, handling multiple tabs, and handling many more steps. | ||
▲ | suchintan 6 hours ago | parent | prev [-] | |
Definitely need a newer benchmark. I couldn't find where browser-use published their run results (expected to see it here https://github.com/browser-use/eval) We went ahead and published our full run at https://eval.skyvern.com so our run could be independently audited |