| ▲ | soulofmischief 2 days ago | ||||||||||||||||
In my project I rigged up an in-browser emulator and directly fed captured images of the screen to local multimodal models. So it just looks right at what's going on, writes a description for refinement, and uses all of that to create and manage goals, write to a scratchpad and submit input. It's minimal scaffolding because I wanted to see what these raw models are capable of. Kind of a benchmark. | |||||||||||||||||
| ▲ | giancarlostoro 2 days ago | parent [-] | ||||||||||||||||
I have a feeling if you gave them access to GameFAQ guides they might be able to play better, but it depends on how you can feed them the data. | |||||||||||||||||
| |||||||||||||||||