the training method and design is the emergent property. disagreement stops token generation. there arnt multi round training that follow reasonable disagreements.