▲ | anerli a day ago | |||||||
So the prompts that are sent to the planner vs executor are completely distinct. We allow complete customization of the planner LLM with all major providers (Anthropic, OpenAI, Google AI Studio, Google Vertex AI, AWS Bedrock, OpenAI compatible). The executor LLM on the other hand has to fit very specific criteria, so we only support the Moondream model right now. For a model to act as the executor it needs to be able to specific specific pixel coordinates (only a few models support this, for example OpenAI/Anthropic computer use, Molmo, Moondream, and some others). We like Moondream because its super tiny and fast (2B). This means as long as we still have a "smart" planner LLM we can have very fast/cheap execution and precise UI interaction. | ||||||||
▲ | badmonster a day ago | parent [-] | |||||||
does Moondream handle multi-step UI tasks reliably (like opening a menu, waiting for render, then clicking), or do you have to scaffold that logic separately in the planner? | ||||||||
|