Remix.run Logo
aunty_helen 6 days ago

Or, apples just so bad at this they’re fumbling the bag. Billions in cash on hand each quarter but don’t have the balls that zuck has to pay unreasonable money. They have their own hardware like google does but are talking about perplexity??? They have all data but can’t seem to get an llm that can set an alarm and be a chatbot at the same time?

Sometimes company’s just don’t do good enough.

anon7000 5 days ago | parent | next [-]

> Billions in cash on hand each quarter but don’t have the balls that zuck has to pay unreasonable money

It remains to be seen whether this was a smart move, or just flailing money at the wall

aunty_helen 5 days ago | parent [-]

The difference is it’s a move. Actually doing something rather than putting out internal PR.

Zuck tried and flailed with the metaverse. That was a huge waste, but he can afford it and fortune favours the brave.

lotsofpulp 5 days ago | parent | next [-]

You don’t think Apple makes moves?

Not everyone has to make the same move at the same time.

SebFender 5 days ago | parent [-]

Apple did many - Just not the right or good ones in the past decade.

lotsofpulp 5 days ago | parent [-]

Services (icloud and music and tv)/airpods/watch/M processors and the new modem seem like good ones.

If those don’t seem like right or good moves, I can’t imagine much will impress you in this world.

mcphage 5 days ago | parent | prev [-]

The Metaverse was a waste of billions of dollars to develop a product that nobody wanted. In no world was that a smart business move, or one that should be emulated. Doing nothing is better than flushing money down the toilet.

potatolicious 5 days ago | parent | prev | next [-]

> "They have all data but can’t seem to get an llm that can set an alarm and be a chatbot at the same time?"

This is actually one of the hardest frontier problems. The "general purpose" assistant is one of the singular hardest technical problems with LLMs (or any kind of NLP).

I think people are easily snowed by LLMs' apparent linguistic fluency that they impute that to capability. This cannot be further from the truth.

In reality a LLM presented with a vast array of tools has extremely poor reliability, so if you want a thing that can order delivery and remember your shopping list and remind you of your flight and play music you're radically exceeding the capabilities of current models. There's a reason successful (anything that isn't demoware/vaporware) uses of agentic LLMs tend to narrow-domain use cases.

There's a reason Google hasn't done it either, and indeed nor has anyone else: neither Anthropic nor OpenAI have a general purpose assistant (defined as being able to execute an indefinite number of arbitrary tools to do things for you, as opposed to merely converse with you).

aunty_helen 5 days ago | parent [-]

You split up the tasks into sub agents. This is something my company builds on top of langgraph.

potatolicious 5 days ago | parent [-]

Sure, go try it and evaluate it rigorously end-to-end, over a sufficient number and variety of tools.

For the purposes of the exercise, let's conservatively say, maybe ~2000 tools covering ~100 major verticals of use cases. Even that may be too narrow for a true general purpose assistant, but it's at least a good start. You can slice the sub-agents however you'd like.

If you can get recall, for real user utterances (not contrived eval utterances authored by your devs and MLEs), over 70% across all the verticals/use cases/tool uses, I'd be extremely impressed. Heck, my thoughts on this won't matter - if you can get the recall for such a system over the bar you'd have cracked something nobody else has and should actively try to sell it to Google for nine figures.

rsanheim 4 days ago | parent | next [-]

Yeah, it turns out many nerds don't consider the fact that the amazing tools we are using to do constrained tasks aren't that great for more general purpose things. Writing a spike, spitting out unit tests, or vibe coding a front end feature is not the same as planning a trip to europe, balancing accounts, or managing a schedule.

So much attention, effort, and tooling has focused on getting llms better at writing more and more code. They can grep and curl and run scripts and iterate and build things really fast, and maybe even maintain it if given enough guardrails and direction.

But it turns out we have had a _ton_ of useful training data for models to work with for software. Not just books or docs, but examples, tests, snippets and full programs for just about any language. Show me a stackoverflow with playwright scripts or API calls (hah, as if thats possible) to build itineraries from delta, aa, united, priceline, expedia, etc, .... which is one part of one piece of the ai assistant pipe-dream.

I don't think its impossible as these tools get much smarter and more generally capable that we get decent assistants in other constrained, non-software domains, but it will take very good companies focusing on it for a long time. Much like any product that try to do these sorts of things.

Its so easy for programmers in our bubble to overlook the complexity involved in automating or even _describing_ simple tasks that humans navigate everyday via habit, learning, experience, and perception...all things that llms struggle with constantly.

aunty_helen 4 days ago | parent | prev [-]

Just to once again bring you up to speed with where the markets at, the thing you originally called out as being difficult is a solved problem.

There’s not just one specific solution to it either, there’s a whole class of tooling for it. And I doubt google would pay 9 figures for something that’s built on top of libraries they put out using models they developed.

As of August 1st ‘we’ (as in, I personally developed with my company, and have been paid for with real dollars which are now sitting in my bank) have a F100 using this tech in production.

As for the no true Scotsman fallacy you’re putting in front of yourself, I will let you deal with that but I would like to see how you came up with the maths.

xnx 5 days ago | parent | prev | next [-]

> They have all data but can’t seem to get an llm that can set an alarm and be a chatbot at the same time?

This does seem like an embarrassing fail, but even Google has not completed replacing Assistant with Gemini. There have also been lost functionality (maybe temporary) in the process.

unsigner 5 days ago | parent | prev | next [-]

they are not talking about perplexity; the endless rumor mill talks about perplexity. The same that has them buying everything from Disney to Porsche to Nike for decades.

paulpauper 6 days ago | parent | prev [-]

Undercut the competitors by charging less. Apple can afford to run its product at a loss.