▲ | grbsh a day ago | ||||||||||||||||||||||||||||||||||
I know moondream is cheap / fast and can run locally, but is it good enough? In my experience testing things like Computer Use, anything but the large LLMs has been so unreliable as to be unworkable. But maybe you guys are doing something special to make it work well in concert? | |||||||||||||||||||||||||||||||||||
▲ | anerli a day ago | parent [-] | ||||||||||||||||||||||||||||||||||
So it's key to still have a big model that is devising the overall strategy for executing the test case. Moondream on its own is pretty limited and can't handle complex queries. The planner gives very specific instructions to Moondream, which is just responsible for locating different targets on the screen. It's basically just the layer between the big LLM doing the actual "thinking" and grounding that to specific UI interactions. Where it gets interesting, is that we can save the execution plan that the big model comes up with and run with ONLY Moondream if the plan is specific enough. Then switch back out to the big model if some action path requires adjustment. This means we can run repeated tests much more efficiently and consistently. | |||||||||||||||||||||||||||||||||||
|