▲ | energy123 5 days ago | |
o3 is a unique model. For difficult math problems, it generates long reasoning traces (e.g. 10-20k tokens). For coding questions, the reasoning tokens are consistently small. Unlike Gemini 2.5 Pro, which generates longer reasoning traces for coding questions. Cost for o3 code generation is therefore driven primarily by context size. If your programming questions have short contexts, then o3 API with flex is really cost effective. For 30k input tokens and 3k output tokens, the cost is 30000 * 0.8 / 1000000 + 3000 * 4 / 1000000 = $0.036 But if you have contexts between 100k-200k, then the monthly plans that give you a budget of prompts instead of tokens are probably going to be cheaper. |