▲ | mike_hearn 3 days ago | |
Nice writeup! A few years ago I proposed to a friend that he should try rendering accessibility trees and fine-tuning a model to issue tool calls over them; I don't know if this has been tried and failed or if nobody bothered trying because so few people know desktop APIs anymore. The main advantage would be accuracy/speed and avoiding the need for too many image tokens (you still need them for things that are actually images, though). |