Remix.run Logo
eugene3306 5 days ago

what's point of comparing token prices? especially for thinking models.

Just now I was testing the new Qwen3-thinking model. I've run the same prompt five times. The costs I got, sorted: 0.0143, 0.0288, 0.0321, 0.0389, 0.048 . And this is for single model.

Also, in my experience, sonnet-4 is cheaper than gemini-2.5-pro, despite token costs being higher.

eugene3306 5 days ago | parent [-]

I think the proper way of estimating the cost is the cost of entire run of a test. Like in aider's leaderboard.