| ▲ | buildbot 8 hours ago |
| Too late, personally after how bad 4.6 was the past week I was pushed to codex, which seems to mostly work at the same level from day to day. Just last night I was trying to get 4.6 to lookup how to do some simple tensor parallel work, and the agent used 0 web fetches and just hallucinated 17K very wrong tokens. Then the main agent decided to pretend to implement tp, and just copied the entire model to each node... |
|
| ▲ | vintagedave 7 hours ago | parent | next [-] |
| Same. I stopped my Pro subscription yesterday after entering the week with 70% of my tokens used by Monday morning (on light, small weekend projects, things I had worked on in the past and barely noticed a dent in usage.) Support was... unhelpful. It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration. But even that wasn't the trigger to leave, it was the attitude Support showed. I figure, if you mess up as badly as Anthropic has, you should at least show some effort towards your customers. Instead I just got a mass of standardised replies, even after the thread replied I'd be escalated to a human. Nothing can sour you on a company more. I'm forgiving to bugs, we've all been there, but really annoyed by indifference and unhelpful form replies with corporate uselessness. So if 4.7 is here? I'd prefer they forget models and revert the harness to its January state. Even then, I've already moved to Codex as of a few days ago, and I won't be maintaining two subscriptions, it's a move. It has its own issues, it's clear, but I'm getting work done. That's more than I can say for Claude. |
| |
| ▲ | spyckie2 6 hours ago | parent | next [-] | | > It's been funny watching my own attitude to Anthropic change, from being an enthusiastic Claude user to pure frustration. You were enthusiastic because it was a great product at an unsustainable price. Its clear that Claude is now harnessing their model because giving access to their full model is too expensive for the $20/m that consumers have settled on as the price point they want to pay. I wrote a more in depth analysis here, there's probably too much to meaningfully summarize in a comment:
https://sustainableviews.substack.com/p/the-era-of-models-is... | | |
| ▲ | rzk 2 hours ago | parent | next [-] | | Off topic, but I really like the writing style on your blog. Do you have any advice for improving my own? In an older comment[1], you mentioned the craft of sharpening an idea to a very fine, meaningful, well-written point. Are there any books, or resources you’d recommend for honing that craft? Thanks in advance. [1] https://news.ycombinator.com/item?id=44082994 | | |
| ▲ | spyckie2 15 minutes ago | parent | next [-] | | The thing that inspires my writing is that the best sentences are self evident. Meaning you declare it without evidence and it feels so intuitively right to most people. It resonates, either being their lived experience, or being the inevitable conclusion of a line of thinking. Making a sentence like requires deeply understanding a problem space to the point where these sentences emerge, rather than any "craft" of writing. So the craft is thinking through a topic, usually by writing about it, and then deleting everything you've written because you arrived at the self evident position, and then writing from the vantage point of that self evident statement. I feel that writing is a personal craft and you must dig it out of yourself through the practice of it, rather than learn it from others. The usage of AI as a resource makes this much clearer to me. You must be confident in your own writing not because it is following best practices or techniques of others but because it is the best version of your own voice at the time of being written. | |
| ▲ | bergheim an hour ago | parent | prev [-] | | Curious why you think that? Stuff like > Yes, there is a relative scale level... > Yes, having the smartest model will... > yes Chinese AI companies have ... yes yes yes, I didn't say anything, why write in a way that insinuates that I was thinking that? I mean it doesn't come off as AI slop, so that's yay in 2026. But why do you think it is so good? | | |
| ▲ | spyckie2 25 minutes ago | parent [-] | | haha it is poorly written, its one of my pieces with the fewest drafts, i just wrote it and clicked submit to get the thoughts out of my head. I think he is referring to the art of refining an idea though, which I do have something to say on his comment. |
|
| |
| ▲ | adrian_b 5 hours ago | parent | prev | next [-] | | I agree with what you what you have written, which is why I would never pay a subscription to an external AI provider. I prefer to run inference on my own HW, with a harness that I control, so I can choose myself what compromise between speed and the quality of the results is appropriate for my needs. When I have complete control, resulting in predictable performance, I can work more efficiently, even with slower HW and with somewhat inferior models, than when I am at the mercy of an external provider. | | |
| ▲ | brightball 2 hours ago | parent [-] | | What’s your setup? | | |
| ▲ | adrian_b an hour ago | parent [-] | | For now, the most suitable computer that I have for running LLMs is an Epyc server with 128 GB DRAM and 2 AMD GPUs with 16 GB of HBM memory each. I have a few other computers with 64 GB DRAM each and with NVIDIA, Intel or AMD GPUs. Fortunately all that memory has been bought long ago, because today I could not afford to buy extra memory. However, a very short time ago, i.e. the previous week, I have started to work at modifying llama.cpp to allow an optimized execution with weights stored in SSDs, e.g. by using a couple of PCIe 5.0 SSDs, in order to be able to use bigger models than those that can fit inside 128 GB, which is the limit to what I have tested until now. By coincidence, this week there have been a few threads on HN that have reported similar work for running locally big models with weights stored in SSDs, so I believe that this will become more common in the near future. The speeds previously achieved for running from SSDs hover around values from a token at a few seconds to a few tokens per second. While such speeds would be low for a chat application, they can be adequate for a coding assistant, if the improved code that is generated compensates the lower speed. | | |
| ▲ | brightball an hour ago | parent [-] | | Thank you for that, it's very interesting. I keep wanting to find time to try out a local only setup with an NVIDIA 4090 and 64gb of RAM. It seems like it may be time try it out. |
|
|
| |
| ▲ | vintagedave 2 hours ago | parent | prev | next [-] | | My bad — I had Max, so more than $20. I can’t edit the comment any more. Can’t keep track of the names. I wonder when ‘pro’ started to mean ‘lowest tier’. But your article is interesting. You think some of the degradation is because when I think I’m using Opus they’re giving me Sonnet invisibily? | | |
| ▲ | spyckie2 7 minutes ago | parent [-] | | Hard to say, but the fact is the intelligence was there and now it's not. Maybe they are giving Sonnet, or maybe a distilled Opus, or maybe Opus but with lower context, not quite sure but intelligence costs compute so less intelligence means cheaper compute. |
| |
| ▲ | joefourier 6 hours ago | parent | prev | next [-] | | I used the $60/mo subscription and I bet most developers get access to AI agents via their company, and there was no difference. They should have reduced the rate limits, or offered a new model, anything except silently reduce the quality of their flagship product to reduce cost. The cost of switching is too low for them to be able to get away with the standard enshittification playbook. It takes all of 5 minutes to get a Codex subscription and it works almost exactly the same, down to using the same commands for most actions. | | |
| ▲ | brightball 2 hours ago | parent [-] | | Thank goodness for capitalism for providing multiple competitors to multibillion dollar companies |
| |
| ▲ | colordrops 2 hours ago | parent | prev [-] | | So instead of breaking shit they should have just increased their prices. |
| |
| ▲ | suzzer99 7 hours ago | parent | prev | next [-] | | It seems like the big companies they're providing Mythos to are their only concern right now. | | |
| ▲ | sethhochberg 5 hours ago | parent [-] | | Corporate software in general is often chosen based on the value returned simply being "good enough" most of the time, because the actual product being purchased is good controls for security, compliance, etc. A corporate purchaser is buying hundreds to thousands of Claude seats and doesn't care very much about percieved fluctuations in the model performance from release to release, they're invested in ties into their SSO and SIEM and every other internal system and have trained their employees and there's substantial cost to switching even in a rapidly moving industry. Consumer end-users are much less loyal, by comparison. |
| |
| ▲ | boppo1 6 hours ago | parent | prev | next [-] | | I havent been using my claude sub lately but I liked 4.6 three weeks ago. Did something change? | | |
| ▲ | GenerocUsername 5 hours ago | parent [-] | | 2 weeks ago the rolling session usage plummeted to borderline unusable. I'd say I get a weekly output equivalent to 2 session windows before change. | | |
| |
| ▲ | dakolli 5 hours ago | parent | prev | next [-] | | Its funny watching llm users act like gamblers. Every other week swearing by one model and cursing another, like a gambler who thinks a certain slot machine, or table is cold this week. These llm companies are literally building slot machine mechanics into their ui interfaces too, I don't think this phenomenon is a coincidence. Stop using these dopamine brain poisoning machines, think for yourself, don't pay a billionaire for their thinking machine. | | |
| ▲ | Majromax 3 hours ago | parent | next [-] | | Don't confuse the many voices of a crowd with a single person's fickle view. If you can track an individual person or organization who changes their mind 'every other week' then more power to you, but unless you're performing that longitudinal study you are simply seeing differential levels of enthusiasm. | |
| ▲ | hk__2 2 hours ago | parent | prev [-] | | > Stop using these dopamine brain poisoning machines, think for yourself, don't pay a billionaire for their thinking machine. Yeah, and also stop using these things they call "computers", think for yourself, write your texts by hand, send letters to people. /s |
| |
| ▲ | brenoRibeiro706 7 hours ago | parent | prev [-] | | [dead] |
|
|
| ▲ | aurareturn 8 hours ago | parent | prev | next [-] |
| Funny because many people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered. But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working. I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC. It seems like 90% of Claude's recent problems are strictly lack of compute related. |
| |
| ▲ | afavour 8 hours ago | parent | next [-] | | > people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered That's not why. It was and is because they've been incredibly unfocused and have burnt through cash on ill-advised, expensive things like Sora. By comparison Anthropic have been very focused. | | |
| ▲ | aurareturn 8 hours ago | parent | next [-] | | I don't think that was the main reason for people thinking OpenAI is going to collapse here. By far, the biggest argument was that OpenAI bet too much on compute. Being unfocused is generally an easy fix. Just cut things that don't matter as much, which they seem to be doing. | | |
| ▲ | scottyah 7 hours ago | parent | next [-] | | Nobody was talking about them betting too much on compute, people were saying that their shady deals on compute with NVIDIA and Oracle were creating a giant bubble in their attempt to get a Too Big To Fail judgement (in their words- taxpayer-backed "backstop"). | |
| ▲ | airstrike 7 hours ago | parent | prev [-] | | It really wasn't. Most of the argument was around product portfolio and agentic coding performance. | | |
| ▲ | aurareturn 5 hours ago | parent [-] | | That’s just short term talk. The main thesis behind their collapse is that they won’t be able to pay their compute bills because they won’t have enough demand to. | | |
| ▲ | airstrike 3 hours ago | parent [-] | | That doesn't really track because their compute isn't like a debt obligation. The compute topic was more around how OpenAI, Nvidia, Oracle, and others were all announcing commitments to spend money in each other in a circular way which could just net out to zero value. |
|
|
| |
| ▲ | jampekka 7 hours ago | parent | prev | next [-] | | To me it seems like they burn so much money they can do lots of things in parallel. My guess would be that e.g. codex and sora are very independently developed. After all there's a quite a hard limit on how many bodies are beneficial to a software project. | | |
| ▲ | wahnfrieden 7 hours ago | parent [-] | | They all compete internally over constrained compute resources - for R&D and production. |
| |
| ▲ | KaiserPro 7 hours ago | parent | prev | next [-] | | Personally its down to Altman having the cognitive capacity of a sleeping snail, the world insight of a hormonal 14 year old who's only ever read one series of manga. Despite having literal experts at his fingertips, he still isn't able to grasp that he's talking unfilters bollocks most of the time. Not to mention is Jason level of "oath breaking"/dishonesty. | |
| ▲ | Robdel12 8 hours ago | parent | prev [-] | | > By comparison Anthropic have been very focused. Ah yes, very focused on crapping out every possible thing they can copy and half bake? |
| |
| ▲ | raincole 2 hours ago | parent | prev | next [-] | | > I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC. AI is one of the things that you cannot find genuine opinions online. Just like politics. If you visit, say, r/codex, you'll see all the people complaining about how their limits are consumed by "just N prompts" (N is a ridiculously small integer). It's all astroturfed from all sides. | | |
| ▲ | hcurtiss an hour ago | parent [-] | | I agree. And I am seeing it in a lot of venues, especially political discourse. Commenting is increasingly AI driven I fear the whole thing is going to collapse and nobody will be able to rely on online commentary to make decisions. At least not without a lot of independent research, maybe that’s for the best, but it’s definitely going to change the Internet. |
| |
| ▲ | madeofpalk 8 hours ago | parent | prev | next [-] | | Seems very short term. Like how cheap Uber was initially. Like Claude was before! Eventually OpenAI will need to stop burning money. | | |
| ▲ | superfrank 3 hours ago | parent [-] | | OpenAI will need to stop burning money eventually, but so does everyone else in the space. The longer they can do this the more squeeze it puts on their competitors. I would call out though that I think there is one way in which this differs from the Uber situation. Theoretically at some point we should hit a place where compute costs start to come down either because we've built enough resources or because most tasks don't need the newest models and a lot of the work people are doing can be automatically sent to cheaper models that are good enough. Unless Uber's self driving program magically pops back up, Uber doesn't really have that since their biggest expense is driver wages. I think it's a long shot, but not impossible, that if OpenAI can subsidize costs long enough that prices don't need to go too much higher to be sustainable. |
| |
| ▲ | simplyluke 5 hours ago | parent | prev | next [-] | | My standing assumption is the darling company/model will change every quarter for the foreseeable future, and everyone will be equally convinced that the hotness of the week will win the entire future. As buyers, we all benefit from a very competitive market. | | | |
| ▲ | l5870uoo9y 8 hours ago | parent | prev | next [-] | | In hindsight, it is painfully clear that Antropic’s conservative investment strategy has them struggling with keeping up with demand and caused their profit margin to shrink significantly as last buyer of compute. | |
| ▲ | redml 7 hours ago | parent | prev | next [-] | | they've also introduced a lot of caching and token burn related bugs which makes things worse. any bug that multiplies the token burn also multiplies their infrastructure problems. | |
| ▲ | energy123 8 hours ago | parent | prev | next [-] | | Is that 2x still going on I thought that ended in early April | | |
| ▲ | arcanemachiner 8 hours ago | parent | next [-] | | Different plan. The old 2x has been discontinued, and the bonus is now (temporarily) available for the new $100 plan users in an effort, presumably, to entice them away from Anthropic. | | | |
| ▲ | lawgimenez 8 hours ago | parent | prev | next [-] | | It’s for Pro users only, I think the 2x is up to May 31. | |
| ▲ | aurareturn 8 hours ago | parent | prev [-] | | They did it again to "celebrate" the release of the $100 plan. | | |
| |
| ▲ | kaliqt 7 hours ago | parent | prev | next [-] | | That’s more a leadership decision because Anthropic are nerfing the model to cut costs, if they stop doing that then they’ll stay ahead. | | | |
| ▲ | Leynos 8 hours ago | parent | prev | next [-] | | Their top tier plan got a 3x limit boost. This has been the first week ever where I haven't run out of tokens. | | | |
| ▲ | pphysch 7 hours ago | parent | prev | next [-] | | The market here is extraordinarily vibes-based and burning billions of dollars for a ephemeral PR boost, which might only last another couple weeks until people find a reason to hate Codex, does not reflect well on OAI's long term viability. | |
| ▲ | zamalek 7 hours ago | parent | prev | next [-] | | > It seems like 90% of Claude's recent problems are strictly lack of compute related. Downtime is annoying, but the problem is that over the past 2-3 weeks Claude has been outrageously stupid when it does work. I have always been skeptical of everything produced - but now I have no faith whatsoever in anything that it produces. I'm not even sure if I will experiment with 4.7, unless there are glowing reviews. Codex has had none of these problems. I still don't trust anything it produces, but it's not like everything it produces is completely and utterly useless. | | |
| ▲ | scottyah 7 hours ago | parent [-] | | So many people confuse sycophantic behavior with producing results. |
| |
| ▲ | saltyoldman 7 hours ago | parent | prev | next [-] | | I have both Claude and OpenAI, side by side. I would say sonnet 46 still beats gpt 54 for coding (at least in my use case) But after about 45 minutes I'm out of my window, so I use openai for the next 4 hours and I can't even reach my limit. | |
| ▲ | llm_nerd 8 hours ago | parent | prev | next [-] | | Most of the compute OpenAI "preordered" is vapour. And it has nothing to do with why people thought the company -- which is still in extremely rocky rapids -- was headed to bankruptcy. Anthropic has been very disciplined and focused (overwhelmingly on coding, fwiw), while OpenAI has been bleeding money trying to be the everything AI company with no real specialty as everyone else beat them in random domains. If I had to qualify OpenAI's primary focus, it has been glazing users and making a generation of malignant narcissists. But yes, Anthropic has been growing by leaps and bounds and has capacity issues. That's a very healthy position to be in, despite the fact that it yields the inevitable foot-stomping "I'm moving to competitor!" posts constantly. | | |
| ▲ | guelo 5 hours ago | parent [-] | | How is droves of your customers leaving, whether they're foot stomping or not, healthy? | | |
| ▲ | llm_nerd 4 hours ago | parent [-] | | Droves? I mean, if we take the "I'm leaving!" posts seriously, the company has people so emotionally invested they feel the need to announce their departure is a pretty good place to be. Some tiny sampling of unhappy customers is indicative of nothing. Honestly at this point I am pretty firmly of the belief that OAI is paying astroturfers to post the "Boy does anyone else think Claude is dumb now and Codex is better?" (always some unreproducible "feel" kind of thing that are to be adopted at face value despite overwhelming evidence that we shouldn't). OAI is kind of in the desperation stage -- see the bizarre acquisitions they've been making, including paying $100M for some fringe podcast almost no one had heard of -- and it would not be remotely unexpected. | | |
| ▲ | guelo 3 hours ago | parent [-] | | We have no idea the ratio of foot stompers to quite quitters but I'm sure most people don't announce it. I cancelled my subscription and hadn't told anybody. And I quit based on personal experience over the last few weeks, not on social media pr. |
|
|
| |
| ▲ | __turbobrew__ 7 hours ago | parent | prev [-] | | All of the smart people I know went to work at OpenAI and none at Anthropic. In addition to financial capital, OpenAI has a massive advantage in human capital over Anthropic. As long as OpenAI can sustain compute and paying SWE $1million/year they will end up with the better product. | | |
| ▲ | scottyah 7 hours ago | parent | next [-] | | Attracting talent with huge sums of money just gets you people who optimize for money, and it's usually never a good long-term decision. I think it's what led to Google's downturn. | | | |
| ▲ | staticman2 2 hours ago | parent | prev | next [-] | | Are those "smart people you know" machine learning researchers? | |
| ▲ | KaiserPro 7 hours ago | parent | prev [-] | | > OpenAI has a massive advantage in human capital over Anthropic. but if your leader is a dipshit, then its a waste. Look You can't just throw money at the problem, you need people who are able to make the right decisions are the right time. That that requires leadership. Part of the reason why facebook fucked up VR/AR is that they have a leader who only cares about features/metrics, not user experience. Part of the reason why twitter always lost money is because they had loads of teams all running in different directions, because Dorsey is utterly incapable of making a firm decision. Its not money and talent, its execution. |
|
|
|
| ▲ | deepsquirrelnet 7 hours ago | parent | prev | next [-] |
| My tinfoil hat theory, which may not be that crazy, is that providers are sandbagging their models in the days leading up to a new release, so that the next model "feels" like a bigger improvement than it is. An important aspect of AI is that it needs to be seen as moving forward all the time. Plateaus are the death of the hype cycle, and would tether people's expectations closer to reality. |
| |
| ▲ | baron3dl an hour ago | parent | next [-] | | I was there too, but honestly after today, 4.7 "feels" just as a bad. I was cynical, but also, kind of eager for the improvement. It's just not there. Compared to early Feb, I have to babysit EVERYTHING. | |
| ▲ | cousinbryce 7 hours ago | parent | prev [-] | | Possibly due to moving compute from inference to training | | |
| ▲ | dluxem 5 hours ago | parent [-] | | My purely unfounded, gut reaction to Opus 4.7 being released today was "Oh, that explains the recent 4.6 performance - they were spinning up inference on 4.7." Of course, I have no information on how they manage the deployment of their models across their infra. |
|
|
|
| ▲ | onlyrealcuzzo 7 hours ago | parent | prev | next [-] |
| I switched to Codex and found it extremely inferior for my use case. It is much faster, but faster worse code is a step in the wrong direction. You're just rapidly accumulating bugs and tech debt, rather than more slowly moving in the correct direction. I'm a big fan of Gemini in general, but at least in my experience Gemini Cli is VERY FAR behind either Codex or CC. It's both slower than CC, MUCH slower than Codex, and the output quality considerably worse than CC (probably worse than Codex and orders of magnitude slower). In my experience, Codex is extraordinarily sycophantic in coding, which is a trait that could t be more harmful. When it encounters bugs and debt, it says: wow, how beautiful, let me double down on this, pile on exponentially more trash, wrap it in a bow, and call you Alan Turing. It also does not follow directions. When you tell it how to do something, it will say, nah, I have a better faster way, I'll just ignore the user and do my thing instead. CC will stop and ask for feedback much more often. YMMV. |
| |
| ▲ | Rastonbury 5 hours ago | parent | next [-] | | What is your use case? I read comments like this and it's totally opposite of my experience, I have both CC Opus 4.6 and Codex 5.4 and Codex is much more thorough and checks before it starts making changes maybe even to a fault but I accept it because getting Opus to redo work because it messes up and jumps in the first attempt is a massive waste of time, all tasks and spec are atomic and granularly spec'd, I'd say 30% of the time I regret when I decide to use Opus for 'simpler' and work | | |
| ▲ | onlyrealcuzzo 2 hours ago | parent [-] | | I'm building a correct, safe, highly understandable, concurrent runtime & language. Essentially Rust/Tokio if it was substantially easier than even Go - and without a need for crates and a subset of the language to achieve near Ada-level safety. The codebase is ~100k lines of code. |
| |
| ▲ | enraged_camel 7 hours ago | parent | prev [-] | | >> I switched to Codex and found it extremely inferior for my use case. Yeah, 100% the case for me. I sometimes use it to do adversarial reviews on code that Opus wrote but the stuff it comes back with is total garbage more often than not. It just fabricates reasons as to why the code it's reviewing needs improvement. |
|
|
| ▲ | _the_inflator 7 hours ago | parent | prev | next [-] |
| Codex really has its place in my bag. I mainly use it, rarely Claude. Codex just gets it done. Very self-correcting by design while Claude has no real base line quality for me. Claude was awesome in December, but Codex is like a corporate company to me. Maybe it looks uncool, but can execute very well. Also Web Design looks really smooth with Codex. OpenAI really impressed me and continues to impress me with Codex. OpenAI made no fuzz about it, instead let results speak. It is as if Codex has no marketing department, just its product quality - kind of like Google in its early days with every product. |
|
| ▲ | desugun 8 hours ago | parent | prev | next [-] |
| I guess our conscience of OpenAI working with the Department of War has an expiry date of 6 weeks. |
| |
| ▲ | arcanemachiner 8 hours ago | parent | next [-] | | That number is generous, and is also a pretty decent lifespan for a socially-conscious gesture in 2026. | |
| ▲ | Findeton 7 hours ago | parent | prev | next [-] | | We all liked the Terminator movies. Hopefully the stay as movies. | |
| ▲ | 7 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | yoyohello13 4 hours ago | parent | prev | next [-] | | I quoted 2 weeks at the time. I think even that was generous. | |
| ▲ | adamtaylor_13 8 hours ago | parent | prev | next [-] | | Most people just want to use a tool that works. Not everything has to be a damn moral crusade. | | |
| ▲ | martimarkov 8 hours ago | parent | next [-] | | Yes, let take morality out of our daily lives as much as possible... That seems like a great categorical imperative and a recipe for social success | | |
| ▲ | cmrdporcupine 6 hours ago | parent | next [-] | | There's nothing moral about Anthropic. Especially to those of us who are not American citizens and to which Dario's pronouncements about ethics apparently do not apply, as stated in his own press release. To me it just looks like a big sanctimonious festival of hypocrisy. | |
| ▲ | adamtaylor_13 7 hours ago | parent | prev [-] | | That's an incredibly uncharitable take on what I said. But that kind of proves my point. Foist your morality upon everyone else and burden them with your specific conscience; sounds like a fun time. | | |
| ▲ | freak42 7 hours ago | parent | next [-] | | What is the charitable way to look at it then? | | |
| ▲ | adamtaylor_13 5 hours ago | parent [-] | | How about assuming the positive intent of what I actually said? Not everything has to be a moral crusade. Let me use the tool without pushing your personal moral opinions on me. The same person wringing their hands over OpenAI, buys clothing made from slave labor and wrote that comment using a device with rare earth materials gotten from slave labor. Why is OpenAI the line? Why are they allowed to "exploit people" and I'm not? Taken to its logical conclusion it's silly. And instead of engaging with that, they deflect with oH yEaH lEtS hAvE nO mOrAlS which is clearly not what I'm advocating. |
| |
| ▲ | some_furry 7 hours ago | parent | prev [-] | | Yeah, why actually engage with moral issues when we can just defer to a status quo that happens to benefit me? |
|
| |
| ▲ | causal 6 hours ago | parent | prev [-] | | "Not everything" - sure, but mass surveillance and autonomous killing are kind of big things to sweep under that rug no? |
| |
| ▲ | cmrdporcupine 6 hours ago | parent | prev | next [-] | | Thing is that Anthropic was always working with DoD, too, and the line in the sand they drew looked really noble until I found it didn't not apply to me, a non-US citizen. Dario made it clear that was the case. And so the difference, to me, was irrelevant. I'll buy based on value, and keep a poker in the fire of Chinese & European open weight models, as well. | |
| ▲ | nothinkjustai 7 hours ago | parent | prev | next [-] | | Not everyone is American, and people who are not see Anthropic state they are willing to spy on our countries and shrug about OAI saying the same about America. What’s the difference to us? | | |
| ▲ | riffraff 7 hours ago | parent [-] | | if you're not american you should be worried about the bit of using AI to kill people which was the other major objection by Anthropic. (not that I think the US DoD wouldn't do that anyway, ToS or not.) | | |
| ▲ | 8note 6 hours ago | parent | next [-] | | well, if they put in a fully automated kill chain, its gonna be weak to attacks to make yourself look like a car, or a video game styled "hide under a box" the current non-automated kill chain has targeted fishermen and a girl's school. Nobody is gonna be held accountable for either. Am i worried about the killing or the AI? If i'm worried about the killing, id much rather push for US demilitarization. | |
| ▲ | pdimitar 7 hours ago | parent | prev | next [-] | | OK, I am worried. Now, what can I actually do? | | |
| ▲ | ArmadilloGang 6 hours ago | parent | next [-] | | Vote with your dollar. Ask others to do the same and explain why. If we all did this, it might matter. There’s not a lot else an individual can do. | | |
| ▲ | cmrdporcupine 6 hours ago | parent [-] | | Dario in fact said it was ok to spy and drone non-US citizens, and in fact endorsed American foreign policy generally. So, no, I'm not voting with my wallet for one American country versus the other. I'll pick the best compromise product for me, and then also boost non-American R&D where I can. |
| |
| ▲ | addandsubtract 6 hours ago | parent | prev | next [-] | | Vote with your wallet, just like Americans. | |
| ▲ | sieabahlpark 6 hours ago | parent | prev [-] | | [dead] |
| |
| ▲ | stavros 5 hours ago | parent | prev | next [-] | | Anthropic's issue was only that the AI isn't yet good enough to tell who's an American, so it avoids killing them. They were fine with the "killing non-Americans" bit. | |
| ▲ | nothinkjustai 7 hours ago | parent | prev [-] | | Not only is Anthropic perfectly happy to let the DoD use their products to kill people, but they are partners with Palantir and were apparently instrumental in the strikes against Iran by the US military. https://www.washingtonpost.com/technology/2026/03/04/anthrop... So uh, yeah, the only difference I see between OAI and Anthropic is that one is more honest about what they’re willing to use their AI for. |
|
| |
| ▲ | PunchTornado 7 hours ago | parent | prev | next [-] | | neah, I believe most people here, which immediately brag about codex, are openai employees doing part of their job. otherwise I couldn't possibly phantom why would anyone use codex. In my company 80% is claude and 15% gemini. you can barely see openai on the graph. and we have >5k programmers using ai every day. | | |
| ▲ | muyuu 6 hours ago | parent | next [-] | | Currently GPT just works much better, and so does Gemini but it's more expensive right now. Going through Opencode stats, their claim is that Gemini is the current best model followed by GPT 5.4 on their benchmarks, but the difference is slim. My personal experience is best with GPT but it could be the specific kind of work I use it for which is heavy on maths and cpp (and some LISP). | |
| ▲ | EQmWgw87pw 7 hours ago | parent | prev | next [-] | | I’m thinking the same thing, Codex literally ruined the codebases that I experimented with it on. | |
| ▲ | scottyah 6 hours ago | parent | prev | next [-] | | OpenAI replaced its founding engineers with Meta PMs. The shift towards consumer engagement metrics and marketing is apparent. | |
| ▲ | 7 hours ago | parent | prev | next [-] | | [deleted] | |
| ▲ | Klayy 7 hours ago | parent | prev [-] | | You can believe whatever you want. I found claude unusable due to limits. Codex works very well for my use cases. |
| |
| ▲ | Der_Einzige 8 hours ago | parent | prev [-] | | Longer than how long anyone cared about epstein. |
|
|
| ▲ | cube2222 8 hours ago | parent | prev | next [-] |
| I've been using it with `/effort max` all the time, and it's been working better than ever. I think here's part of the problem, it's hard to measure this, and you also don't know in which AB test cohorts you may currently be and how they are affecting results. |
| |
| ▲ | siegers 7 hours ago | parent | next [-] | | Agree. I keep effort max on Claude and xhigh on GPT for all tasks and keep tasks as scoped units of work instead of boil the ocean type prompts. It is hard to measure but ultimately the tasks are getting completed and I'm validating so I consider it "working as expected". | |
| ▲ | bryanlarsen 7 hours ago | parent | prev [-] | | It works better, until you run out of tokens. Running out of tokens is something that used to never happen to me, but this month now regularly happens. Maybe I could avoid running out of tokens by turning off 1M tokens and max effort, but that's a cure worse than the disease IMO. | | |
| ▲ | cube2222 4 hours ago | parent [-] | | I would risk a guess that people have a wrong intuition about the long-context pricing and are complaining because of that. Yeah, the per-token price stays the same, even with large context. But that still means that you're spending 4x more cache-read tokens in a 400k context conversation, on each turn, than you would be in a 100k context conversation. |
|
|
|
| ▲ | gonzalohm 8 hours ago | parent | prev | next [-] |
| Until the next time they push you back to Claude. At this point, I feel like this has to be the most unstable technology ever released. Imagine if docker had stopped working every two releases |
| |
| ▲ | sergiotapia 8 hours ago | parent [-] | | There is zero cost to switching ai models. Paid or open source. It's one line mostly. | | |
| ▲ | gonzalohm 8 hours ago | parent | next [-] | | What about your chat history? That has some value, at least for me. But what has even more value is stable releases. | | |
| ▲ | simplyluke 5 hours ago | parent | next [-] | | This is one of the many reasons I don't think the model companies are going to win the application space in coding. There's literally zero context lost for me in switching between model providers as a cursor user at work. For personal stuff I'll use an open source harness for the same reason. | |
| ▲ | distances 3 hours ago | parent | prev | next [-] | | I don't see any value in chat history. I delete all conversations at least weekly, it feels like baggage. | |
| ▲ | srmatto 5 hours ago | parent | prev | next [-] | | You can output it as a memory using a simple prompt. You could probably re-use this prompt for any product with only slight modification. Or you could prompt the product to output an import prompt that is more tuned to its requirements. e.g. https://claude.com/import-memory | |
| ▲ | drewnick 7 hours ago | parent | prev | next [-] | | I think this is more about which model you steer your coding harness to. You can also self-host a UI in front of multiple models, then you own the chat history. | |
| ▲ | sergiotapia 7 hours ago | parent | prev [-] | | for me there is zero value there. |
| |
| ▲ | charcircuit 7 hours ago | parent | prev [-] | | Codex doesn't read Claude.md like Claude does. It's not a "one line" change to switch. | | |
|
|
|
| ▲ | thisisit 7 hours ago | parent | prev | next [-] |
| Personally I find using and managing Claude sessions and limits is getting exhausting and feels similar to calorie counting. You think you are going to have an amazing low calories meal only to realize the meal is full of processed sugars and you overshot the limit within 2-3 bites. Now "you have exhausted your limit for this time. Your session limits resets in next 4 hrs". |
| |
| ▲ | hootz 7 hours ago | parent [-] | | Yep, it just feels terrible, the usage bars give me anxiety, and I think that's in their interest as they definitely push me towards paying for higher limits. Won't do that, though. |
|
|
| ▲ | 0xbadcafebee 6 hours ago | parent | prev | next [-] |
| Usually the problems that cause this kind of thing are: 1) Bad prompt/context. No matter what the model is, the input determines the output. This is a really big subject as there's a ton of things you can do to help guide it or add guardrails, structure the planning/investigation, etc. 2) Misaligned model settings. If temperature/top_p/top_k are too high, you will get more hallucination and possibly loops. If they're too low, you don't get "interesting" enough results. Same for the repeat protection settings. I'm not saying it didn't screw up, but it's not really the model's fault. Every model has the potential for this kind of behavior. It's our job to do a lot of stuff around it to make it less likely. The agent harness is also a big part of it. Some agents have very specific restrictions built in, like max number of responses or response tokens, so you can prevent it from just going off on a random tangent forever. |
|
| ▲ | alvis 8 hours ago | parent | prev | next [-] |
| I don't have much quality drop from 4.6. But I also notice that I use codex more often these days than claude code |
| |
| ▲ | buildbot 8 hours ago | parent | next [-] | | It's been shockingly bad for me - for another example when asked to make a new python script building off an existing one; for some cursed reason the model choose to .read() the py files, use 100 of lines of regex to try to patch the changes in, and exec'd everything at the end... | | |
| ▲ | kivle 7 hours ago | parent [-] | | Hate that about Claude Code. I have been adding permissions for it to do everything that makes sense to add when it comes to editing files, but way too often it will generate 20-30 line bash snippets using sed to do the edits instead, and then the whole permission system breaks down. It means I have to babysit it all the time to make sure no random permission prompts pop up. |
| |
| ▲ | fluidcruft 7 hours ago | parent | prev [-] | | I generally think codex is doing well until I come in with my Opus sweep to clean it up. Claude just codes closer to the way my brain works. codex is great at finding numerical stability issues though and increasingly I like that it waits for an explicit push to start working. But talking to Claude Code the way I learned to talk to codex seems to work also so I think a lot of it is just learning curve (for me). |
|
|
| ▲ | arrakeen 8 hours ago | parent | prev | next [-] |
| so even with a new tokenizer that can map to more tokens than before, their answer is still just "you're not managing your context well enough" "Opus 4.7 uses an updated tokenizer that [...] can map to more tokens—roughly 1.0–1.35× depending on the content type. [...] Users can control token usage in various ways: by using the effort parameter, adjusting their task budgets, or prompting the model to be more concise." |
|
| ▲ | frank-romita 8 hours ago | parent | prev | next [-] |
| That's wild that you think 4.6 is bad..... Each model has its strengths and weaknesses I find that Codex is good for architectural design and Claude Is actually better the engineering and building |
|
| ▲ | siegers 8 hours ago | parent | prev | next [-] |
| I enjoy switching back and forth and having multi-agent reviews. I'm enjoying Codex also but having options is the real win. |
|
| ▲ | 7 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | muzani 8 hours ago | parent | prev | next [-] |
| For me, making it high effort just fixed all the quality problems, and even cut down on token use somehow |
| |
| ▲ | vunderba 8 hours ago | parent [-] | | This. They kind of snuck this into the release notes: switching the default effort level to Medium. High is significantly slower, but that’s somewhat mitigated by the fact that you don’t have to constantly act like a helicopter parent for it. |
|
|
| ▲ | nico 7 hours ago | parent | prev | next [-] |
| I do feel that CC sometimes starts doing dumb tasks or asking for approval for things that usually don’t really need it. Like extra syntax checks, or some greps/text parsing basic commands |
| |
| ▲ | CamperBob2 5 hours ago | parent [-] | | Exactly. Why do they ask permission for read-only operations?! You either run with --dangerously-skip-permissions or you come back after 30 minutes to find it waiting for permission to run grep. There's no middle ground, at least not that Claude CLI users have access to. |
|
|
| ▲ | queuep 8 hours ago | parent | prev | next [-] |
| Before opus released we also saw huge backlash with it being dumber. Perhaps they need the compute for the training |
|
| ▲ | sgt 6 hours ago | parent | prev | next [-] |
| Strange. Opus 4.6 has been great for me. On Max 20x |
|
| ▲ | 8 hours ago | parent | prev | next [-] |
| [deleted] |
|
| ▲ | geooff_ 8 hours ago | parent | prev | next [-] |
| I've noticed the same over the last two weeks. Some days Claude will just entirely lose its marbles. I pay for Claude and Codex so I just end up needing to use codex those days and the difference is night and day. |
|
| ▲ | r0fl 8 hours ago | parent | prev | next [-] |
| Same! I thought people were exaggerating how bad Claude has gotten until it deleted several files by accident yesterday Codex isn’t as pretty in output but gets the job done much more consistently |
|
| ▲ | tiel88 7 hours ago | parent | prev | next [-] |
| I've been raging pretty hard too. Thought either I'm getting cleverer by the day or Claude has been slipping and sliding toward the wrong side of the "smart idiot" equation pretty fast. Have caught it flat-out skipping 50% of tasks and lying about it. |
|
| ▲ | keeganpoppen 6 hours ago | parent | prev | next [-] |
| codex low-key seems to be better than claude. and i say this as an 18-hour-a-day user of both (mostly claude) |
|
| ▲ | estimator7292 7 hours ago | parent | prev | next [-] |
| Anecdotally, codex has been burning through way more tokens for me lately. Claude seems to just sit and spin for a long time doing nothing, but at least token use is moderate. All options are starting to suck more and more |
|
| ▲ | OtomotO 8 hours ago | parent | prev | next [-] |
| Same for me. I cancelled my subscription and will be moving to Codex for the time being. Tokens are way too opaque and Claude was way smarter for my work a couple of months ago. |
|
| ▲ | hk__2 8 hours ago | parent | prev | next [-] |
| Meh. At $work we were on CC for one month, then switched to Codex for one month, and now will be on CC again to test. We haven’t seen any obvious difference between CC and Codex; both are sometimes very good and sometimes very stupid. You have to test for a long time, not just test one day and call it a benchmark just because you have a single example. |
|
| ▲ | te_chris 7 hours ago | parent | prev | next [-] |
| I try codex, but i hate 5.4's personality as a partner. It's a demon debugger though. but working closely with it, it's so smug and annoying. |
|
| ▲ | varispeed 7 hours ago | parent | prev | next [-] |
| How do you get codex to generate any code? I describe the problem and codex runs in circles basically: codex> I see the problem clearly. Let me create a plan so that I can implement it. The plan is X, Y, Z. Do you want me to implement this? me> Yes please, looks good. Go ahead! codex> Okay. Thank you for confirming. So I am going to implement X, Y, Z now. Shall I proceeed? me> Yes, proceed. codex> Okay. Implementing. ...codex is working... you see the internal monologue running in circles codex> Here is what I am going to implement: X, Y, Z me> Yes, you said that already. Go ahead! codex> Working on it. ...codex in doing something... codex> After examining the problem more, indeed, the steps should be X, Y, Z. Do you want me to implement them? etc. Very much every sessions ends up being like this. I was unable to get any useful code apart from boilerplate JS from it since 5.4 So instead I just use ChatGPT to create a plan and then ask Opus to code, but it's a hit and miss. Almost every time the prompt seems to be routed to cheaper model that is very dumb (but says Opus 4.6 when asked). I have to start new session many times until I get a good model. |
| |
| ▲ | skocznymroczny 2 hours ago | parent | next [-] | | It's just like subscription based MMORPGs that delay you as much as possible every step of the way because that's the way they can extract more money from you. If you pay for the tokens it's not in their benefit to give you the answer directly. | |
| ▲ | Gracana 6 hours ago | parent | prev [-] | | Do you have to put it in a build/execute mode (separate from a planning mode) to allow it to move on? I use opencode, and that's how it works. |
|
|
| ▲ | cmrdporcupine 8 hours ago | parent | prev [-] |
| Yep, I'll wait for the GPT answer to this. If we're lucky OpenAI will release a new GPT 5.5 or whatever model in the next few days, just like the last round. I have been getting better results out of codex on and off for months. It's more "careful" and systematic in its thinking. It makes less "excuses" and leaves less race conditions and slop around. And the actual codex CLI tool is better written, less buggy and faster. And I can use the membership in things like opencode etc without drama. For March I decided to give Claude Code / Opus a chance again. But there's just too much variance there. And then they started to play games with limits, and then OpenAI rolled out a $100 plan to compete with Anthropic's. I'm glad to see the competition but I think Anthropic has pissed in the well too much. I do think they sent me something about a free month and maybe I will use that to try this model out though. |
| |
| ▲ | gck1 17 minutes ago | parent | next [-] | | What bothers me with codex cli is that it feels like it should be more observable, more open and verbose about what the model is doing per step, being an open source product and OpenAI seemingly being actually open for once, but then it does a tool call - "Read $file" and I have no idea whether it read the entire file, or a specific chunk of it. Claude cli shows you everything model is doing unless it's in a subagent (which is why I never use subagents). | |
| ▲ | davely 8 hours ago | parent | prev [-] | | I’ve been on the Claude Code train for a while but decided to try Codex last week after they announced the $100 USD Pro plan. I’ve been pretty happy with it! One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction. Claude Code has been slowly turning into this mysterious black box, wiping out terminal context any time it compacts a conversation (which I think is their hacky way of dealing with terminal flickering issues — which is still happening, 14 months later), going out of the way to hide thought output, and then of course the whole performance issues thing. Excited to try 4.7 out, but man, Codex (as a harness at least) is a stark contrast to Claude Code. | | |
| ▲ | pxc 7 hours ago | parent | next [-] | | > One thing I immediately like more than Claude is that Codex seems much more transparent about what it’s thinking and what it wants to do next. I find it much easier to interrupt or jump in the middle if things are going to wrong direction. I've finally started experimenting recently with Claude's --dangerously-skip-permissions and Codex's --dangerously-bypass-approvals-and-sandbox through external sandboxing tools. (For now just nono¹, which I really like so far, and soon via containerization or virtual machines.) When I am using Claude or Codex without external sandboxing tools and just using the TUI, I spend a lot of time approving individual commands. When I was working that way, I found Codex's tendency to stop and ask me whether/how it should proceed extremely annoying. I found myself shouting at my monitor, "Yes, duh, go do the thing!". But when I run these tools without having them ask me for permission for individual commands or edits, I sometimes find Claude has run away from me a little and made the wrong changes or tried to debug something in a bone-headed way that I would have redirected with an interruption if it has stopped to ask me for permissions. I think maybe Codex's tendency to stop and check in may be more valuable if you're relying on sandboxing (external or built-in) so that you can avoid individual permissions prompts. -- 1: https://nono.sh/ | |
| ▲ | arcanemachiner 8 hours ago | parent | prev | next [-] | | There is a new flag for terminal flickering issues: > Claude Code v2.1.89: "Added CLAUDE_CODE_NO_FLICKER=1 environment variable to opt into flicker-free alt-screen rendering with virtualized scrollback" | | |
| ▲ | gck1 3 hours ago | parent [-] | | Such an interesting choice for a flag name. NO_BUG_PLEASE=1 |
| |
| ▲ | ipkstef 7 hours ago | parent | prev | next [-] | | there is an official codex plugin for claude. I just have them do adversarial reviews/implementations. etc with each other. adds a bit of time to the workflow but once you have the permissions sorted it'll just engage codex when necessary | |
| ▲ | cmrdporcupine 8 hours ago | parent | prev [-] | | Do this -- take your coworker's PRs that they've clearly written in Claude Code, and have Codex/GPT 5.4 review them. Or have Codex review your own Claude Code work. It then becomes clear just how "sloppy" CC is. I wouldn't mind having Opus around in my back pocket to yeet out whole net new greenfield features. But I can't trust it to produce well-engineered things to my standards. Not that anybody should trust an LLM to that level, but there's matters of degree here. | | |
| ▲ | kevinsync 7 hours ago | parent | next [-] | | I've been using Claude and Codex in tandem ($100 CC, $20 Codex), and have made heavy use of claude-co-commands [0] to make them talk. Outside of the last 1-2 weeks (which we now have confirmation YET AGAIN that Claude shits the fucking bed in the run-up to a new model release), I usually will put Claude on max + /plan to gin up a fever dream to implement. When the plan is presented, I tell it to /co-validate with Codex, which tends to fill in many implementation gaps. Claude then codes the amended plan and commits, then I have a Codex skill that reviews the commit for gaps, missed edge cases, incorrect implementation, missed optimizations, etc, and fix them. This had been working quite well up until the beginning of the month, Claude more or less got CTE, and after a week of that I swapped to $100 Codex, $20 CC plans. Now I'm using co-validation a lot less and just driving primarily via Codex. When Claude works, it provides some good collaborative insights and counter-points, but Codex at the very least is consistently predictable (for text-oriented, data-oriented stuff -- I don't use either for designing or implementing frontend / UI / etc). As always, YMMV! [0] https://github.com/SnakeO/claude-co-commands | | |
| ▲ | hulk-konen 3 hours ago | parent | next [-] | | Some variation of this is the way. You should not get dependent on one black box. Companies will exploit that dependency. My version of this is having CC Pro, Cursor Pro, and OpenCode (with $10 to Codex/GLM 5.1) --> total $50. My work doesn't stop if one of these is having overloaded servers, etc. And it's definitely useful to have them cross-checking each other's plans and work. | |
| ▲ | cmrdporcupine 7 hours ago | parent | prev [-] | | This more or less mimics a flow that I had fairly good results from -- but I'm unwilling to pay for both right now unless I had a client or employer willing to foot the bill. Claude Code as "author" and a $20 Codex as reviewer/planner/tester has worked for me to squeeze better value out of the CC plan. But with the new $100 codex plan, and with the way Anthropic seemed to nerf their own $100 plan, I'm not doing this anymore. |
| |
| ▲ | afavour 8 hours ago | parent | prev | next [-] | | > It then becomes clear just how "sloppy" CC is. Have you done the reverse? In my experience models will always find something to criticize in another model's work. | | |
| ▲ | cmrdporcupine 8 hours ago | parent [-] | | I have, and in fact models will find things to criticize in their own work, too, so it's good to iterate. But I've had the best results with GPT 5.4 |
| |
| ▲ | woadwarrior01 8 hours ago | parent | prev [-] | | It cuts both ways. What I usually do these days is to let codex write code, then use claude code /simplify, have both codex and claude code review the PR, then finally manually review and fixup things myself. It's still ~2x faster than doing everything by myself. | | |
| ▲ | cmrdporcupine 7 hours ago | parent [-] | | I often work this way too, but I'll say this: This flow is exhausting. A day of working this way leaves me much more drained than traditional old school coding. | | |
| ▲ | woadwarrior01 7 hours ago | parent [-] | | 100%. On days when I'm sleep deprived (once or twice a week), I fallback to this flow. On regular days, I tend to write more code the old school way and use things things for review. |
|
|
|
|
|