| ▲ | postalcoder 3 hours ago | |||||||
First thoughts using gpt-5.3-codex-spark in Codex CLI: Blazing fast but it definitely has a small model feel. It's tearing up bluey bench (my personal agent speed benchmark), which is a file system benchmark where I have the agent generate transcripts for untitled episodes of a season of bluey, perform a web search to find the episode descriptions, and then match the transcripts against the descriptions to generate file names and metadata for each episode. Downsides: - It has to be prompted to do actions in my media library AGENTS.md that the larger models adhere to without additional prompting. - It's less careful with how it handles context which means that its actions are less context efficient. Combine that with the smaller context window and I'm seeing frequent compactions. | ||||||||
| ▲ | alexdobrenko an hour ago | parent | next [-] | |||||||
can we plese make the bluey bench the gold standard for all models always | ||||||||
| ▲ | mnicky 2 hours ago | parent | prev | next [-] | |||||||
Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast. | ||||||||
| ||||||||
| ▲ | Squarex 2 hours ago | parent | prev [-] | |||||||
I wonder why they named it so similiarly to the normal codex model while it much worse, while cool of course. | ||||||||