▲ | suchintan 6 hours ago | |
Definitely need a newer benchmark. I couldn't find where browser-use published their run results (expected to see it here https://github.com/browser-use/eval) We went ahead and published our full run at https://eval.skyvern.com so our run could be independently audited |