yes we did a bunch of experiments on this! we could get OS models up to but not beyond any of the best closed foundation models. Gemini 2.5/Claude 4 most reliable as an api option