Good points. And I also agree we'd have to see the data that OP collected.
If it indeed did show a slow decline over time and OpenAI did not change the weights, then something does not add up.