Remix.run Logo
croes 20 hours ago

It’s a rant against the wrong usage of a tool not the tool as such.

Turskarama 20 hours ago | parent | next [-]

It's a tool that promotes incorrect usage though, and that is an inherent problem. All of these companies are selling AI as a tool to do work for you, and the AI _sounds confident_ not matter what it spits out.

Terr_ 18 hours ago | parent | next [-]

My personal pet-peeve is how a great majority of people--and too many developers--are being misled into believing a fictional character coincidentally named "Assistant" inside a story-document half-created by an LLM is the author-LLM.

If a human generates a story containing Count Dracula, that doesn't mean vampires are real, or that capabilities like "turning into a cloud of bats" are real, or that the algorithm "thirsts for the blood of the innocent."

The same holds when the story comes from an algorithm, and it continues to hold when story is about a differently-named character named "AI Assistant" who is "helpful".

Getting people to fall for this illusion is great news for the companies though, because they can get investor-dollars and make sales with the promise of "our system is intelligent", which is true in the same sense as "our system converts blood into immortality."

croes 18 hours ago | parent | prev | next [-]

That's the real danger of AI.

The false promises of the AI companies and the false expectations of the management and users.

Had it just recently for a data migration where the users asked if they still need to enter meta data for documents they just could use AI to query data that was usually based on that meta data.

They trust AI before it's even there and don't even consider a transition period where they check if the result are correct.

Like with security convenience prevails.

blackqueeriroh 17 hours ago | parent [-]

But isn’t this just par for the course with every new technological revolution?

“It’ll change everything!” they said, as they continued to put money in their pockets as people were distracted by the shiny object.

croes 11 hours ago | parent [-]

With every revolution and with every fake revolution.

NFTs didn't change much, money changed its owner

xpe 16 hours ago | parent | prev [-]

> All of these companies are selling AI as a tool to do work for you, and the AI _sounds confident_ not matter what it spits out.

If your LLM + pre-prompt setup sounds confident with every response, something is probably wrong; it doesn't have to be that way. It isn't for me. I haven't collected statistics, but I often get decent nuance back from Claude.

Think more about what you're doing and experiment. Try different pre-prompts. Try different conversation styles.

This is not dismissing the tendency for overconfidence, sycophancy, and more. I'm just sharing some mitigations.

GeoAtreides 16 hours ago | parent | next [-]

> Think more about what you're doing and experiment. Try different pre-prompts. Try different conversation styles.

Ask on a Wednesday. During a full moon. While in a shipping container. Standing up. Keep a black box on your desk as the sacred GenAI avatar and pray to it. Ask while hopping on one leg.

xpe 7 hours ago | parent [-]

Funny but uncharitable. See https://news.ycombinator.com/newsguidelines.html

Turskarama 12 hours ago | parent | prev [-]

Here's the root of the problem though, how do you know that the AI is actually "thinking" more carefully, as opposed to just pretending to?

The short answer is: you can know for a fact that it _isn't_ thinking more carefully because LLMs don't actually think at all, they just parrot language. LLMs are performing well when they are putting out what you want to hear, which is not necessarily a well thought out answer but rather an answer that LOOKS well thought out.

xpe 6 hours ago | parent [-]

1. I don't think the comment above gets to the "root" of the problem, which is "the LLM appears overconfident". Thankfully, that problem is relatively easy to address by trying different LLMs and different pre-prompts. Like I said, your results might vary.

2. While the question of "is the AI thinking" is interesting, I think it is a malformed question. Think about it: how do you make progress on that question, as stated? My take: it is unanswerable without considerable reframing. It helps to reframe toward something measurable. Here, I would return to the original question: to what degree does an LLM output calibrated claims? How often does it make overconfident claims? Underconfident claims?

3. Pretending requires at least metacognition, if not consciousness. Agree? It is a fascinating question to explore how much metacognition a particular LLM demonstrates.

In my view, this is still a research question, both in terms of understanding how LLM architectures work as well as designing good evals to test for metacognition.

In my experience, when using chain-of-thought, LLMs can be quite good at recognizing previous flaws, including overconfidence, meaning that if one is careful, the LLM behaves as if it has a decent level of metacognition. But to see this, the driver (the human) must demonstrate discipline. I'm skeptical that most people prompt LLMs rigorously and carefully.

4. It helps discuss this carefully. Word choice matters a lot with AI discussions, much more than a even a relatively capable software developer / hacker is comfortable with. Casual phrasings are likely to lead us astray. I'll make a stronger claim: a large fraction of successful tech people haven't yet developed clear language and thinking about discussing classic machine learning, much less AI as a field or LLMs in particular. But many of these people lack the awareness or mindset to remedy this; they fall into the usual overconfidence or lack-of-curiosity traps.

5. You wrote: "LLMs are performing well when they are putting out what you want to hear."

I disagree; instead, I claim people, upon reflection, would prefer an LLM be helpful, useful, and true. This often means correcting mistakes or challenging assumptions. Of course people have short-term failure modes, such is human nature. But when you look at most LLM eval frameworks, you'll see that truth and safety matter are primary factors. Yes-manning or sycophancy is still a problem.

6. Many of us have seen the "LLMs just parrot language" claim repeated many times. After having read many papers on LLMs, I wouldn't use the words "LLMs just parrot language". Why? That phrase is more likely to confuse discussion than advance it.

I recommend this to everyone: instead of using that phrase, challenge yourself to articulate at least two POVs relating to the "LLMs are stochastic parrots" argument. Discuss with a curious friend or someone you respect. If it is just someone online you don't know, you might simply dismiss them out of hand.

The "stochastic parrot" phrase is fun and is a catchy title for an AI researcher who wants to get their paper noticed. But isn't a great phrase for driving mutual understanding, particularly not on a forum like HN where our LLM foundations vary widely.

Having said all this, if you want to engage on the topic at the object level, there are better fora than HN for it. I suggest starting with a literature review and finding an ML or AI-specific forum.

7. There is a lot of confusion and polarization around AI. We are capable of discussing better, but (a) we have to want to; (b) we have to learn now; and (c) we have to make time to do it.

Like I wrote in #6, above, be mindful of where you are discussing and the level of understanding of people around. I've found HN to be middling on this, but I like to pop in from time to time to see how we're doing. The overconfidence and egos are strong here, arguably stronger than the culture and norms that should help us strive for true understanding.

8. These are my views only. I'm not "on one side", because I reject the false dichotomy that AI-related polarization might suggest.

mike_hearn 17 hours ago | parent | prev [-]

Well, it's actually a rant about AI making what the author perceives as mistakes. Honestly it reads like the author is attempting to show off or brag by listing imaginary mistakes an AI might have made, but they are all the sort of mistakes a human could make too. And the fact that they are not real incidents, significantly weakens his argument. He is a consultant who sells training services so obviously if people come to rely on AI more for this kind of thing he will be out of work.

It does not help that his examples of things an imaginary LLM might miss are all very subjective and partisan too.