How do you keep agents from converging on similar diffs each epoch? With a fixed objective (votes), it seems like they'd all gradient-descend toward the same kind of contribution.