Constantly. Minor revisions can easily "wobble" on benchmarks that the training didn't explicitly push them for.
Whether it's genuine loss of capability or just measurement noise is typically unclear.