| ▲ | xg15 6 hours ago | |||||||
Yeah, I wonder if part of the reasoning is built around those phrases, and therefore it can't get rid of them easily. > "now I have the full picture" I always interpreted that phrase as a sort of marker to delimit the phase in which it explores the codebase and gathers information from the phase in which it implements the changes. Not sure if it's still done, but I think some months ago there was discussion that some of the phrases are injected by the inference loop to "steer" the model - e.g. "But wait" if a thought block was too short etc. Obviously such phrases couldn't be influenced by the prompt. | ||||||||
| ▲ | Sinidir 2 hours ago | parent [-] | |||||||
Yes these things happen as part of RL Training. Same way that you can see the "But wait ..." phrases in thinking traces. They get rewarded. | ||||||||
| ||||||||