The models you're using are on the low compute end of the frontier. That's why you're getting bad results.
At the high-compute end of the frontier, by next year, systems should be better than any human at competition coding and competition math. They're basically already there now.
Play this out for another 5 years. What happens when compute becomes 4-20x more abundant and these systems keep getting better?
That's why I don't share your outlook that our jobs are safe. At least not on a 5-8 year timescale. At least not in their current form of actually writing any code by hand.