Wouldn't the correct tool here be a multi-armed bandit optimization, like an epsilon-greedy algorithm?