| ▲ | Both Codex and Claude got worse this week. Across every plan I retested(desktopcommander.app) | ||||||||||||||||
| 7 points by wonderwhyer 14 hours ago | 9 comments | |||||||||||||||||
| ▲ | wonderwhyer 14 hours ago | parent | next [-] | ||||||||||||||||
Building a tool to compare value across LLM provider options. Part of it tracks how many tokens you actually get from various subscriptions, over time. Past week, multiple people asked me about it — they'd been hitting Claude and Codex limits faster than expected. Ran the tests yesterday. Reran today. Here's what came back: ▸ ChatGPT Plus / GPT-5.5: 95M → 37M tokens/week (−61%) ▸ Claude Max 20× / Sonnet 4.6: 388M → 214M (−45%) ▸ Claude Max 20× / Opus 4.7: 248M → 162M (−35%) ▸ Claude Pro / Sonnet 4.6: 19.6M → 11.4M (−42%) ▸ Claude Pro / Opus 4.7: 15.6M → 10.2M (−35%) 5 of 5 retested plans dropped 35-61% in five days. None went up. Anyone else seeing similar in their own usage? | |||||||||||||||||
| |||||||||||||||||
| ▲ | derbOac 13 hours ago | parent | prev | next [-] | ||||||||||||||||
"Quality metrics" need much more discussion and attention, in my opinion. Not a criticism of this project — it's a good idea, it just highlights the central question of "how well is this model working?" I'm not sure it's so straightforward. | |||||||||||||||||
| |||||||||||||||||
| ▲ | jdw64 14 hours ago | parent | prev | next [-] | ||||||||||||||||
AI always seems to perform best on the first day after release, and then its performance gradually declines. Is the AI itself degrading? Or is it because of product-policy changes, such as system prompt modifications and usage limits? Or is it both? I sometimes wonder whether degradation is simply an inherent property of LLMs themselves. | |||||||||||||||||
| |||||||||||||||||
| ▲ | saidnooneever 14 hours ago | parent | prev [-] | ||||||||||||||||
so happy clang's output is consistently great | |||||||||||||||||
| |||||||||||||||||