| ▲ | nowittyusername 11 hours ago |
| I've done experiments and basically what I found was that LLM models are extremely sensitive to .....language. Well, duh but let me explain a bit. They will give a different quality/accuracy of answer depending on the system prompt order, language use, length, how detailed the examples are, etc... basically every variable you can think of is responsible for either improving or causing detrimental behavior in the output. And it makes sense once you really grok that LLM;s "reason and think" in tokens. They have no internal world representation. Tokens are the raw layer on which they operate. For example if you ask a bilingual human what their favorite color is, the answer will be that color regardless of what language they used to answer that question. For an LLM, that answer might change depending on the language used, because its all statistical data distribution of tokens in training that conditions the response. Anyway i don't want to make a long post here. The good news out of this is that once you have found the best way in asking questions of your model, you can consistently get accurate responses, the trick is to find the best way to communicate with that particular LLM. That's why i am hard at work on making an auto calibration system that runs through a barrage of ways in finding the best system prompts and other hyperparameters for that specific LLM. The process can be fully automated, just need to set it all up. |
|
| ▲ | leonidasv 8 hours ago | parent | next [-] |
| I somewhat agree, but I think that the language example is not a good one. As Anthropic have demonstrated[0], LLMs do have "conceptual neurons" that generalise an abstract concept which can later be translated to other languages. The issue is that those concepts are encoded in intermediate layers during training, absorbing biases present in training data. It may produce a world model good enough to know that "green" and "verde" are different names for the same thing, but not robust enough to discard ordering bias or wording bias. Humans suffer from that too, albeit arguably less. [0] https://transformer-circuits.pub/2025/attribution-graphs/bio... |
| |
| ▲ | bunderbunder 7 hours ago | parent [-] | | I have learned to take these kinds of papers with a grain of salt, though. They often rest on carefully selected examples that make the behavior seem much more consistent and reliable than it is. For example, the famous "king - man + woman = queen" example from Word2Vec is in some ways more misleading than helpful, because while it worked fine for that case it doesn't necessarily work nearly so well for [emperor, man, woman, empress] or [husband, man, woman, wife]. You get a similar thing with convolutional neural networks. Sometimes they automatically learn image features in a way that yields hidden layers that easy and intuitive to interpret. But not every time. A lot of the time you get a seemingly random garble that belies any parsimonious interpretation. This Anthropic paper is at least kind enough to acknowledge this fact when they poke at the level of representation sharing and find that, according to their metrics, peak feature-sharing among languages is only about 30% for English and French, two languages that are very closely aligned. Also note that this was done using two cherry-picked languages and a training set that was generated by starting with an English language corpus and then translating it using a different language model. It's entirely plausible that the level of feature-sharing would not be nearly so great if they had used human-generated translations. (edit: Or a more realistic training corpus that doesn't entirely consist of matched translations of very short snippets of text.) Just to throw even more cold water on it, this also doesn't necessarily mean that the models are building a true semantic model and not just finding correlations upon which humans impose semantic interpretations. This general kind of behavior when training models on cross-lingual corpora generated using direct translations was first observed in the 1990s, and the model in question was singular value decomposition. | | |
| ▲ | jiggawatts 3 hours ago | parent [-] | | I’m convinced that language sharing can be encouraged during training by rewarding correct answers to questions that can only be answered based on synthetic data in another language fed in during a previous pretraining phase. Interleave a few phases like that and you’d force the model to share abstract information across all languages, not just for the synthetic data but all input data. I wouldn’t be surprised if this improved LLM performance by another “notch” all by itself, especially for non-English users. | | |
|
|
|
| ▲ | not_maz 9 hours ago | parent | prev | next [-] |
| I found an absolutely fascinating analysis on precisely this topic by an AI researcher who's also a writer: https://archive.ph/jgam4 LLMs can generate convincing editorial letters that give a real sense of having deeply read the work. The problem is that they're extremely sensitive, as you've noticed, to prompting as well as order bias. Present it with two nearly identical versions of the same text, and it will usually choose based on order. And social proof type biases to which we'd hope for machines to be immune can actually trigger 40+ point swings on a 100-point scale. If you don't mind technical details and occasional swagger, his work is really interesting. |
|
| ▲ | TOMDM 7 hours ago | parent | prev | next [-] |
| This doesn't match Anthropics research on the subject > Claude sometimes thinks in a conceptual space that is shared between languages, suggesting it has a kind of universal “language of thought.” We show this by translating simple sentences into multiple languages and tracing the overlap in how Claude processes them. https://www.anthropic.com/research/tracing-thoughts-language... |
|
| ▲ | pton_xd 11 hours ago | parent | prev | next [-] |
| Yep, LLMs tell you "what you want to hear." I can usually predict the response I'll get based on how I phrase the question. |
| |
| ▲ | jonplackett 9 hours ago | parent [-] | | I feel like LLMs have a bit of the Clever Hans effect. It takes a lot of my cues as to what it thinks I want it to say or opinion it thinks I want it to have. Clever Hans was a horse who people thought could do maths by tapping his hoof. But actually he was just reading the body language of the person asking the question. Noticing them tense up as he got to the right number of stamps and stopping - still pretty smart for a horse, but the human was still doing the maths! | | |
| ▲ | not_maz 8 hours ago | parent [-] | | What's worse is that it can sometimes (but not always) read through your anti-bias prompts. "No, I want your honest opinion." "It's awesome."
"I'm going to invest $250,000 into this. Tell me what you really think." "You should do it."
(New Session)
"Someone pitched to me the idea that..." "Reject it."
|
|
|
|
| ▲ | smusamashah 6 hours ago | parent | prev | next [-] |
| Once can see that very easily in image generation models. The "Elephant" it generates is lot different from "Haathi" (Hindi/Urdu). Same goes for other concepts that have 1-to-1 translation but the results are different. |
|
| ▲ | thinkling 9 hours ago | parent | prev | next [-] |
| I thought embeddings were the internal representation? Does reasoning and thinking get expanded back out into tokens and fed back in as the next prompt for reasoning? Or does the model internally churn on chains of embeddings? |
| |
| ▲ | HappMacDonald 6 hours ago | parent | next [-] | | I'd direct you to the 3 blue 1 brown presentation on this topic, but in a nutshell the semantic space for an embedding can become much richer than the initial token mapping due to previous context.. but only during the course of predicting the next token. Once that's done, all rich nuance achieved during the last token-prediction step is lost, and then rebuilt from scratch again on the next token-prediction step (oftentimes taking a new direction due to the new token, and often more powerfully any changes at the tail of the context window such as lost tokens, messages, re-arrangement due to summarizing, etc). So if you say "red ball" somewhere in the context window, then during each prediction step that will expand into a semantic embedding that neither matches "red" nor "ball", but that richer information will not be "remembered" between steps, but rebuilt from scratch every time. | |
| ▲ | hansvm 9 hours ago | parent | prev [-] | | There's a certain one-to-oneness between tokens and embeddings. A token expands into a large amount of state, and processing happens on that state and nothing else. The point is that there isn't any additional state or reasoning. You have a bunch of things equivalent to tokens, and the only trained operations deal with sequences of those things. Calling them "tokens" is a reasonable linguistic choice, since the exact representation of a token isn't core to the argument being made. |
|
|
| ▲ | wyett 9 hours ago | parent | prev [-] |
| [dead] |