Remix.run Logo
motorest 6 days ago

> For casual discussion about well-written topics, that's more than good enough. But for unique problems in a non-English language, it struggles. It always will. It doesn't matter how big you make the model.

Not to disagree, but "non-english" isn't exactly relevant. For unique problems, LLMs can still manage to output hallucinations that end up being right or useful. For example, LLMs can predict what an API looks like and how it works even if they do not have the API in context if the API was designed following standard design principles and best practices. LLMs can also build up context while you interact with them, which means that iteratively prompting them that X works while Y doesn't will help them build the necessary and sufficient context to output accurate responses.

windward 6 days ago | parent | next [-]

>hallucinations

This is the first word that came to mind when reading the comment above yours. Like:

>They can't, despite marketing, really reason

They aren't, despite marketing, really hallucinations.

Now I understand why these companies don't want to market using terms like "extrapolated bullshit", but I don't understand how there is any technological solution to it without starting from a fresh base.

motorest 6 days ago | parent [-]

> They aren't, despite marketing, really hallucinations.

They are hallucinations. You might not be aware of what that concept means in terms of LLMs but just because you are oblivious to the definition of a concept that does not mean it doesn't exist.

You can learn about the concept by spending a couple of minutes reading this article on Wikipedia.

https://en.wikipedia.org/wiki/Hallucination_(artificial_inte...

> Now I understand why these companies don't want to market using terms like "extrapolated bullshit", (...)

That's literally in the definition. Please do yourself a favour and get acquainted with the topic before posting comments.

zahlman 5 days ago | parent | next [-]

> You might not be aware of what that concept means in terms of LLMs

GP is perfectly aware of this, and disagrees that the metaphor used to apply the term is apt.

Just because you use a word to describe a phenomenon doesn't actually make the phenomenon similar to others that were previously described with that word, in all the ways that everyone will find salient.

When AIs generate code that makes a call to a non-existent function, it's not because they are temporarily mistakenly perceiving (i.e., "hallucinating") that function to be mentioned in the documentation. It's because the name they've chosen for the function fits their model for what a function that performs the necessary task might be called.

And even that is accepting that they model the task itself (as opposed to words and phrases that describe the task) and that they somehow have the capability to reason about that task, which has somehow arisen from a pure language model (whereas humans can, from infancy, actually observe reality, and contemplate the effect of their actions upon the real world around them). Knowing that e.g. the word "oven" often follows the word "hot" is not, in fact, tantamount to understanding heat.

In short, they don't perceive, at all. So how can they be mistaken in their perception?

windward 6 days ago | parent | prev [-]

That page was made in December 2022, requires specifying '(artificial intelligence)' and says:

>(also called bullshitting,[1][2] confabulation,[3] or delusion)[4]

Here's the first linked source:

https://www.psypost.org/scholars-ai-isnt-hallucinating-its-b...

motorest 6 days ago | parent [-]

> That page was made in December 2022, (...)

Irrelevant. Wikipedia does not create concepts. Again, if you take a few minutes to learn about the topic you will eventually understand the concept was coined a couple of decades ago, and has a specific meaning.

Either you opt to learn, or you don't. Your choice.

> Here's the first linked source:

Irrelevant. Your argument is as pointless and silly as claiming rubber duck debugging doesn't exist because no rubber duck is involved.

windward 6 days ago | parent [-]

Uh oh! Let me spend a few minutes to learn about the topic. Thankfully, a helpful Hacker News user has linked me to a useful resource.

I will follow one of the linked sources to the paper 'ChatGPT is bullshit'

>Hicks, M.T., Humphries, J. and Slater, J. (2024). ChatGPT is bullshit. Ethics and information technology, 26(2). doi:https://doi.org/10.1007/s10676-024-09775-5.

Hicks et al. note:

>calling their mistakes ‘hallucinations’ isn’t harmless: it lends itself to the confusion that the machines are in some way misperceiving but are nonetheless trying to convey something that they believe or have perceived.

What an enlightening input. I will now follow another source, 'Why ChatGPT and Bing Chat are so good at making things up'

>Edwards, B. (2023). Why ChatGPT and Bing Chat are so good at making things up. [online] Ars Technica. Available at: https://arstechnica.com/information-technology/2023/04/why-a....

Edwards notes:

>In academic literature, AI researchers often call these mistakes "hallucinations." But that label has grown controversial as the topic becomes mainstream because some people feel it anthropomorphizes AI models (suggesting they have human-like features) or gives them agency (suggesting they can make their own choices) in situations where that should not be implied. The creators of commercial LLMs may also use hallucinations as an excuse to blame the AI model for faulty outputs instead of taking responsibility for the outputs themselves.

>Still, generative AI is so new that we need metaphors borrowed from existing ideas to explain these highly technical concepts to the broader public. In this vein, we feel the term "confabulation," although similarly imperfect, is a better metaphor than "hallucination." In human psychology, a "confabulation" occurs when someone's memory has a gap and the brain convincingly fills in the rest without intending to deceive others. ChatGPT does not work like the human brain, but the term "confabulation" arguably serves as a better metaphor because there's a creative gap-filling principle at work

It links to a tweet from someone called 'Yann LeCun':

>Future AI systems that are factual (do not hallucinate)[...] will have a very different architecture from the current crop of Auto-Regressive LLMs.

That was an interesting diversion, but let's go back to learning more. How about 'AI Hallucinations: A Misnomer Worth Clarifying'?

>Maleki, N., Padmanabhan, B. and Dutta, K. (2024). AI Hallucinations: A Misnomer Worth Clarifying. 2024 IEEE Conference on Artificial Intelligence (CAI). doi:https://doi.org/10.1109/cai59869.2024.00033.

Maleki et al. say:

>As large language models continue to advance in Artificial Intelligence (AI), text generation systems have been shown to suffer from a problematic phenomenon often termed as "hallucination." However, with AI’s increasing presence across various domains, including medicine, concerns have arisen regarding the use of the term itself. [...] Our results highlight a lack of consistency in how the term is used, but also help identify several alternative terms in the literature.

Wow, how interesting! I'm glad I opted to learn that!

My fun was spoiled though. I tried following a link to the 1995 paper, but it was SUPER BORING because it didn't say 'hallucinations' anywhere! What a waste of effort, after I had to go to those weird websites just to be able to access it!

I'm glad I got the opportunity to learn about Hallucinations (Artificial Intelligence) and how they are meaningfully different from bullshit, and how they can be avoided in the future. Thank you!

withinboredom 6 days ago | parent | prev [-]

> Not to disagree, but "non-english" isn't exactly relevant.

how so? programs might use english words but are decidedly not english.

motorest 6 days ago | parent [-]

> how so? programs might use english words but are decidedly not english.

I pointed out the fact that the concept of a language doesn't exist in token predictors. They are trained with a corpus, and LLMs generate outputs that reflect how the input is mapped in accordance to how the were trains with said corpus. Natural language makes the problem harder, but not being English is only relevant in terms of what corpus was used to train them.