There's no "just" in RL. Fine tuning is very important and could make a lot of difference.
apparently GPT-5 uses the same pretrain as 4o did, hah