Remix.run Logo
harrouet 9 hours ago

This is Apple commoditizing LLMs while keeping control of the UX.

They are a hardware company and will keep selling the best machine for AI use. Well done.

tedggh 8 hours ago | parent | next [-]

Benedict Evans may be right after all; frontier models look more and more like telecom companies in the 90s. Billions and billions of investment in infrastructure while others further up the stack captured all the value.

CuriouslyC 6 hours ago | parent | next [-]

There will be frontier models that are non-commoditized, but they'll be kept guarded and hidden away, and you'll only get the final result, so that they can't be distilled and their harness can't be reverse engineered. They'll be billed like employees, rather than like a tool.

hedora 5 hours ago | parent | next [-]

The non-commodity network services of the early 1990’s and the non-commodity 3d graphics hardware of the mid-1990s made the same argument.

yandie 4 hours ago | parent | prev | next [-]

I doubt that. What stops the Chinese labs from figuring it out? It’s not like these models are fundamentally different from each other

CuriouslyC 4 hours ago | parent [-]

If all you have is the starting point and the finishing point, the lack of the path taken from one point to another limits your ability to train models that can efficiently recreate the work, and increases its cost enough that it's possible the US labs can progress capabilities faster than Chinese labs can distill that behavior.

sealeck 2 hours ago | parent | next [-]

> lack of the path taken from one point to another limits your ability to train models that can efficiently recreate the work

Isn’t this the problem inference (training) a model is designed to solve :)))

jmalicki an hour ago | parent [-]

It is!

And it's a hard problem.

What's an easier form of training is being able to see the intermediate results and train to imitate them.

wahnfrieden 3 hours ago | parent | prev [-]

That’s already the case. Chinese ingenuity allowed them to achieve what they did without access to reasoning outputs

mingqiz 2 hours ago | parent | prev | next [-]

Isn't that what they are doing already? The model is already guarded and hidden and i only get to send it what i want. Talk with it to clarify my requirements. And i can switch to a different provider for cheaper/better results.

lacy_tinpot 2 hours ago | parent | prev | next [-]

They tried to do that with operating systems and the browser.

naravara 5 hours ago | parent | prev | next [-]

I think this will be isolated to highly specialized fields where training data will need to be selectively curated.

greenavocado 6 hours ago | parent | prev [-]

Everything can be distilled, it will just become more painful

alecco 7 hours ago | parent | prev | next [-]

In spite of their deeper pockets, massive datacenters, colosal amounts of user data, and hundreds of thousands of top developers, even Amazon, Meta, Microsoft, and Google are well behind.

I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)

ksec 7 hours ago | parent | next [-]

>I think Evans is completely wrong.

I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.

I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.

One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.

To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.

ak_111 10 minutes ago | parent | next [-]

Just top of my head (and I don't even follow his takes that closely), just check his takes on Magic Leap which he consistently promoted using quite dramatic langauge (along with the entire AR space) and check how it panned out.

overfeed 2 hours ago | parent | prev | next [-]

> On the other hand, even if Apple's AI were 6 - 9 months or a generation behind,

Do you mean Google's AI with Apple wrappers? Apple's in-house AI is further behind Google, amd very far from the frontier according to your ranking. IMO, Google is on the frontier - I recall Altman calling for an OpenAI all-hands-on deck when Gemini was released because of how good it was compared to ChatGPT. I also suspect Google has the lowest operating expenses due to scale, experience and luck/planning (TPUs), there will come a time when AI investments will slow down, and the cost of revenue will become more important.

alecco 4 hours ago | parent | prev [-]

Even their own employees get frustrated if they can't use Claude or Codex. 6-9 months is a big difference and I think it's closer to 9 than 6. And never mind the harness etc are also many months behind.

geodel 4 hours ago | parent | next [-]

This is just wishful thinking. I am sure someone from gossip media will also find Apple employees who are ready to leave job if Apple disallows Claude usage.

If anything Apple should notice it is Anthropic has got a really good marketing team and it would be no shame if they pick a trick or two from them.

throwaway98797 2 hours ago | parent | prev [-]

people use outlook when gmail exists.

employees will always suffer.

hedora 5 hours ago | parent | prev | next [-]

Remember the implicit “pareto” in “frontier models”.

Anthropic and OpenAI are far behind state of the art for the entire curve except the “extremely expensive for barely measurable improvements” part.

GLM is probably the third most expensive frontier model (benchmarks and reviews will say for sure), and is apparently ~Opus 4.6 for 10% the inference cost.

The last I checked, qwen was still owning the 24-32GiB RAM range (it runs reasonably without a GPU!) and somewhere around 3.5-4 generation models.

Also, even anthropic says Mythos ~= ChatGPT 5.5, so it’s unlikely either one is leaving the other behind. The big problem they both have is they asked for the government to gate keep model releases and use cases, and their wish was granted.

That’s knocked them back 6 months already. Anthropic’s only frontier offering has been taken down.

tedggh 7 hours ago | parent | prev | next [-]

I use both Claude and Codex and don’t see any meaningful difference between the two. My use case is modeling semi complex physical processes (energy and manufacturing) in code for simulations. I also have to do a good fair of automation via scripting in Python or PowerShell for manipulating data as well as legacy code analysis (C, Fortran, COBOL). Given I provide the models with the information and documentation they need, both perform very similarly. I recently did a full codebase review (for design patterns and vulnerabilities) and both Codex and Fable agreed 100% about the most critical findings. I do very little front end development, although some of my automation scripts have TUIs and again no problem with either Claude or Codex generating them for me. At this point I go with the less expensive, which seems to be Codex. With the $100 plan I rarely hit the limits. With Claude I max out my plan in about 4-6 hours of work.

joenot443 6 hours ago | parent [-]

Did you find much of a difference between Fable and Opus?

thrill 3 hours ago | parent | next [-]

Yes. Fable is much more organized and consistent at taking small bites of the (sorry) apple when solving a problem. Specifically I'm talking about a machine learning problem I'd been working on for awhile with Opus and it was (and is, again) constantly stating that all the signal is exploited, everything is now overfit, etc, etc, etc. The first day I pointed Fable at the situation I got a 10% improvement by paying attention to the little details that Opus instead took slightly negative results and extrapolated to "fully exploited". I've had to drop back, again, to forcing Opus to explain what it's looked at and the detail it has quietly assumed away.

It's like the difference to talking to two smartest kids in a class, but one really belongs a grade higher - and the other hasn't learned yet to ask the questions that encourage it to dig in that little bit more for the additional multi-order effects.

yfontana 3 hours ago | parent [-]

Had a very similar experience. Opus went "look, t-sne shows your features are neatly clustered" (it didn't) and left it at that. Fable didn't fully explore the problem/data, but it did go much further, implementing models to check for correlations and adjust feature clusters. Opus was able to finish the job after Fable was cut, but required much prodding (doing exactly what you described: pointing it towards things that look off and asking it, are you sure that's all there is to this?).

tedggh 34 minutes ago | parent | prev | next [-]

I have used Fable only once to do an in depth codebase review of a complex system. I asked it to flag deviations from a particular design and also compile a list of vulnerabilities. It took about 15-20 minutes. The result was very similar to Codex for the most critical findings, different suggestions on how to address them but it found exactly the same critical issues as Codex. This is still not a good test to evaluate Fable. But my feeling is that the latest models are all pretty good and now it comes down to your personal setup and workflow, that’s where you can get the productivity gains IMO. It’s like picking between MacOS or Windows as development environment. For some Windows sucks and for a some is the opposite, but both groups of people can be equally productive if they know their environments well and know how to go around their respective limitations.

hedora 4 hours ago | parent | prev [-]

I constantly hit safety blocks in Fable (I’m trying to write secure software, which is equivalent to finding security holes, so banned).

I didn’t use it on big enough tasks to notice any improvement.

I had been hitting plan limits pretty regularly, but fixed it by changing my workflow. That also increased the success rate of claude by an order of magnitude.

jimbokun 3 hours ago | parent | prev | next [-]

Is Google behind? The general opinions I read suggest Gemini is very competitive with Anthropic and OpenAI's top models.

wolttam 4 hours ago | parent | prev | next [-]

I think it's highly likely that there will remain one or two companies on the very bleeding edge of AI development for the foreseeable future.

But what I think a lot of people miss is that the market for the truly bleeding edge (developing bio-tech, building the most sophisticated software stacks (probably with a tilt towards simulation, GPU kernel optimization, etc)) is not the whole market.

There's a plethora of use-cases for models that are not on the bleeding edge. If I can solve my relatively simple problems with an off-the-shelf model for a minuscule fraction of the cost of the frontier, I'm going to.

thewebguyd 3 hours ago | parent [-]

Anecdotal case in point, but writing mostly enterprise CRUD in C#, I've gotten plenty of mileage out of Sonnet, very rarely do I need to use Opus.

Its somewhat of a myth that you need the most advanced, expensive model for software development.

embedding-shape 7 hours ago | parent | prev | next [-]

> I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)

Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.

And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.

haellsigh 7 hours ago | parent | next [-]

I agree with what you're saying. I have a Claude plan for work and I prefer using Claude more than any other LLM I've tried. Having recently tried the Codex 100€ plan with GPT-5.5 in high/xhigh, I don't think it's worse that the Opus models, just different.

I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.

Just my two cents.

embedding-shape 6 hours ago | parent [-]

> I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.

Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.

Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.

WarmWash 5 hours ago | parent | prev | next [-]

People use a model as their daily driver, get very familiar with it and it's behavior, and then go and use another model and have a hard time. It's very difficult to separate "the model is bad" from "the model works differently".

JumpCrisscross 5 hours ago | parent [-]

> It's very difficult to separate "the model is bad" from "the model works differently"

At which point it’s fair to reject the commoditization label.

Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.

embedding-shape 3 hours ago | parent [-]

> Also missing from these discussions are e.g. Qwen, which is at least as good as one back from OpenAI or Anthropic’s frontiers.

They're missing in the discussion because the ones you can run locally, aren't actually "one step away from other closed-source labs" in practice when you use them. They might benchmark as such, but they're sadly far away from measuring up to those scores except for very specific use cases, even when you have say 96GB of VRAM available to run the bigger models even most (at home) consumers won't be able to run.

JumpCrisscross 3 hours ago | parent [-]

> the ones you can run locally, aren't actually "one step away from other closed-source labs"

And they probably won’t be for at least another decade. Comparing like with like, flagship model running on the best hardware it can run on, Qwen is close.

embedding-shape 3 hours ago | parent [-]

> Qwen is close

I wish so badly this was true, but sadly today it just isn't.

JumpCrisscross 3 hours ago | parent [-]

To be clear, I’m relaying my subjective experience comparing Opus and Qwen.

computerex an hour ago | parent | prev | next [-]

For HPC/ai work opus blows gpt away, it’s no competition.

alecco 7 hours ago | parent | prev [-]

When you say "Pro" territory, do you include Fable?

embedding-shape 7 hours ago | parent [-]

You mean the model that was available for a whole of three days? No, I had played around with it a tiny bit, but not much than that. I guess time will tell if it gets close.

bushbaba 3 hours ago | parent | prev | next [-]

I'm perfectly happy at claude opus 4.6. All improvements since then have not meaningfully improved my day to day. If i can get 4.6 on my laptop for 5-10k, i'd gladly start shifting my ~1k/month Anthropic spend over.

Some of the harness even let you run a local model for most things, and only pay for the latest frontier models when needed, which cuts down cost drastically.

afavour 7 hours ago | parent | prev [-]

Maybe I’m alone in thinking this but I think the long term victor will be the one that works out pricing best.

Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.

I don’t believe a mature AI industry is going to have a one size fits all, single winner.

tedggh 6 hours ago | parent [-]

Yes, and pricing is one of the features of a commodity, because users can jump back and forth between services, it becomes a pricing race to the bottom. Agree also that you don’t need the best model all the time. You could have the most powerful model draft the design, requirements, guidelines, policies or whatnot then get the lower tier models execute it. Then again you can have the most powerful model do the testing and review, and give back feedback, rinse and repeat. Just like in the real world you don’t need an entire staff of lead engineers.

zitterbewegung 2 hours ago | parent | prev | next [-]

It is much better. Imagine if the whole Manhattan project could have been outsourced and costs you nothing. I expect in a short time that open source models will be almost or almost parity by 2030 and running on consumer devices.

HPsquared 2 hours ago | parent [-]

Market phenomena like this are a bit like the Manhattan project in that you pay for it, and make use of it, whether you want to or not. It's functionally very similar to the government doing something.

axus 3 hours ago | parent | prev [-]

Last I checked the telcos made plenty of money in the 90s. Should Verizon be getting a cut of my Claude Pro subscription, since I use FIOS to access it?

colechristensen 2 hours ago | parent [-]

This is what everybody is TRYING to do. They built something and will do everything they can to charge outsized rent on it far past the value it provides to take revenue from anyone downstream.

The fact that telcos couldn't charge rent was a primary reason the Internet was so successful.

Remember $0.10 per text message? You bet in some alternate timeline AT&T charges $0.10 per webpage visit and we're stuck on 100kbps connections because the monopoly doesn't want to innovate.

post-it 5 hours ago | parent | prev | next [-]

> while keeping control of the UX.

Extremely tangential, but this is my favourite upshot of AI. For decades, companies have been walling off their services and forcing us into their fuckass UIs. Now over the course of the last twelve months, suddenly everything has an MCP and I can use it through my command line chat interface.

Any company that doesn't adapt gets so hammered by people's AI-DIY web scrapers that they have no choice but to cave.

halJordan 5 hours ago | parent | prev | next [-]

It's been clear for years now that eventually ai will be embedded at the os level. Apple even recognized it way back when they first introduced Apple Intelligence. Yes they're commoditizing llms or whatever. But this has been a user facing feature they've been iterating on for years now

swingboy 8 hours ago | parent | prev | next [-]

Does “the best machine for AI use” apply here considering these models are still server-side?

embedding-shape 7 hours ago | parent | next [-]

The play here seems pretty evidence, if I may assume. Apple creates an interface that is generalized enough so you can easily swap models, and while Claude is preferred by Apple today, it may be any provider or even local models in the future, and the APIs the developers use remain the same, so "migration" becomes easier.

WorldMaker 3 hours ago | parent | prev | next [-]

Apple's been trying to make the marketing appeal that "Private Compute Cloud" is also a hardware project. Given it seems to rely on low level details of device Hardware Security Modules, it's maybe even at least a little bit more than just "marketing spin".

ABS 5 hours ago | parent | prev | next [-]

for the on-device model, yes it runs on the Neural Engine (at the moment) so a newer chip means faster, cheaper local inference. For the server side path this Claude package is about your machine is irrelevant since it's a network call. The same API covers both, so "best machine for AI" only bites when the session is actually local.

But we can imagine that the balance of what's on-device vs what's remote will move continuously towards the former as time, improved HW and improved local models keep progressing

brookst 7 hours ago | parent | prev [-]

I would think so, as “use” doesn’t specify implementation. If you use a word processor it may be running locally or remotely.

From a user’s perspective, it doesn’t matter.

sqquima 6 hours ago | parent [-]

[dead]

dlev_pika 25 minutes ago | parent | prev | next [-]

Apple’s play was a masterclass - unsure how deliberate it was, or how much of a choice thy actually had, but it’s turning out pretty well IMO.

Now if they can further reinforce their angle on Privacy, they might continue to be what they are (or more)

amelius 5 hours ago | parent | prev | next [-]

Now we only need to commoditize the hardware.

hedora 4 hours ago | parent | next [-]

Check out AMD’s offerings.

They’re typically a bit better on high TDP stuff, and a bit worse on low TDP. They mostly match in the middle. I have a $500 AMD NUC and a slightly older $2000 MBP. Inference throughput is within 2x.

The comparison is a little messy: AMD currently maxes out at 128GB of RAM vs Apple’s discontinued 512. Apple has nothing to rival the Steam Deck.

jimbokun 2 hours ago | parent | prev [-]

This is what originally made Microsoft the most lucrative tech company of its day.

Android succeeded at this to an extent with phones, but Apple has been able to keep its products differentiated enough in the minds of consumers to maintain their premium pricing. So far.

Danox 2 hours ago | parent [-]

Vertical computer company operating system plus hardware under one roof.

wuliwong 2 hours ago | parent | prev | next [-]

I think there is an opportunity for a new hardware company to enter the market. I know this is just hypothetical but I believe that AI is revolutionary enough where a new approach to hardware and UI/UX will enable far more value to be derived from AI. I think the incumbents like Apple will stick to their familiar platforms and could get beaten out by a new competitor that is AI native to the core. Maybe? ¯\_(ツ)_/¯

5 hours ago | parent | prev | next [-]
[deleted]
klausa 9 hours ago | parent | prev | next [-]

How is this Apple keeping control of the UX?

matwood 8 hours ago | parent [-]

The betas of the next OS's include a Siri AI chatbot, and the AI features are built into various parts of the OS. A user has no idea what model is powering any of it - Apple controls the UX.

mr_toad an hour ago | parent | next [-]

I’ll be curious to see if they make the models accessible to Shortcuts, like they do with the current models.

klausa 8 hours ago | parent | prev [-]

I'm aware. How is this relevant to the posted article?

embedding-shape 8 hours ago | parent [-]

The article is about (from the eyes of a user) white-labeled usage of Claude models on Apple devices, this subthread is about white-labeled usage of LLMs on Apple devices, how is it not relevant?

klausa 8 hours ago | parent [-]

Because that's not what the article is about; this is about a unified API for the _app developers_ to access different kind of models.

That API has no user-facing components, and has no influence over UX of what the end-users are interacting with.

The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.

embedding-shape 8 hours ago | parent [-]

> The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.

That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!

klausa 7 hours ago | parent [-]

I’m sorry, so your position now is that “being completely invisible to the users” is “controlling the UX”?

3 hours ago | parent | next [-]
[deleted]
embedding-shape 7 hours ago | parent | prev | next [-]

I think you're taking the written words a bit too literally here. Read it with a more lax filter and less literal word-meaning, and I think the original comment will become a bit clearer.

klausa 6 hours ago | parent [-]

You know what, I've been a bit too snipe-y in my previous comments, and it led to to discussion devolving in unproductive ways.

I'd genuinely like to understand where you're coming from more.

I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.

I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).

Help me see your point of view!

embedding-shape 3 hours ago | parent | next [-]

Thanks for the patience!

The way I see it, isn't about what is immediately there right now today, but what intent it signals, or what path Apple is planning. Yes, today it's ClaudeForFoundationModels, but the FoundationModels stuff will be used to allowed switching between models, probably without users noticing, and who knows what Apple will ultimately surface to users, tends to be in the direction of less user-control.

But there is a lot of assumptions, guesses and extrapolation from that, I think you're right if you focus only what's there right now, rather than trying to "see into the future" which harrouet basically started doing with their root comment.

geodel 5 hours ago | parent | prev [-]

I don't know if it helps. One way to look at it is branding product. Apple is branding the product. So they supposedly have more value to customers as it stands for quality, awareness, trust etc. As oppose to 100 little components in computer which maybe from different brands, and Apple may switch brand year to year without user noticing. So those components makers have little power over Apple.

Same is happening to Claude software package as it would stand behind branded Apple foundation models. From pure software developer thinking this is exactly what Claude offered here so where is the issue? Issue is in larger space where Apple could take steps to block Claude out of their ecosystem if they so wish at some point and there is little Claude / Anthropic would do if Apple Foundation is the only thing that Apple consumers would know about.

klausa 4 hours ago | parent [-]

That framing would make sense to me if the thing being discussed was Apple letting _end users_ somehow access Claude models white-labeled as "Apple Foundation Model", sure? Or even letting _developers_ access Apple-hosted Claude or something?

But this is very much _not_ what this is.

Apple showed a bunch of new APIs at WWDC last week. One of this is a way for a developers to interact with LLM's in a way that let's you easily swap out models (with a bunch of other niceties around it), including swapping between on-device and remote models.

This is _Anthropic_ (not Apple!) shipping their support for that framework, so you can also switch between different Anthropic models using the same APIs you'd use to swap between a local or PCC model.

I expect OpenAI will probably ship their shims in the next couple of weeks too? (You can probably vibe-code one in half an hour if you point Codex at the Anthropic one, tbh).

(Apple also doesn't use "Apple Foundation Model" anywhere in the user-facing marketing materials AFAICT, this is strictly developer facing terminology, but I could be wrong?)

My impression is that people are _wildly_ misunderstanding what this _actually_ is, and running wild with speculation/interpretation.

butlike 6 hours ago | parent | prev | next [-]

I can't reply to your child comment for whatever reason, but Siri is part of the Apple Foundation Models framework. The idea is that no matter what backend the developer uses, the end user will always say "Hey Siri." This is analogous to controlling the UX. Siri is independent of whichever model the app developer uses.

klausa 5 hours ago | parent [-]

No, Siri is entirely separate from this framework.

Are you thinking about Intents? That lets Siri interact with data (and perform some actions in them) from your apps, but it is something completely different.

You can definitely expose things from your app via Intents that will end up calling an external arbitrary LLM somewhere, but it does not require using Foundation Models API whatsoever.

kcb 6 hours ago | parent | prev [-]

It's Apple, so it's some revolutionary big brained play, and not just yet another llm sdk.

3 hours ago | parent | prev | next [-]
[deleted]
6 hours ago | parent | prev [-]
[deleted]