| ▲ | isoprophlex 3 hours ago | |
My gut feeling says: cheap gemini model will be fine. Try the cheapest you can find, go more expensive if at first you don't succeed. invest in a good prompt describing the setup, your goals, when to move. Type your output, don't go parsing move commands out of unstructured chat output. And maybe validate first on the data you already collected: does the vlm take the same actions as your existing train set? And then just let it run and collect data for as long as you can afford. Maybe 0.2 fps (sample and take action every 5 sec) is already good enough. Good luck! | ||