| ▲ | JoshTriplett 8 hours ago |
| A plausible theory I've seen going around: https://x.com/QiaochuYuan/status/2049307867359162460 |
|
| ▲ | danpalmer 7 hours ago | parent | next [-] |
| If you tell an LLM it's a mushroom you'll get thoughts considering how its mycelium could be causing the goblins. This "theory" is simply role playing and has no grounding in reality. |
|
| ▲ | krackers 7 hours ago | parent | prev | next [-] |
| I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin? |
| |
| ▲ | palmotea 7 hours ago | parent | next [-] | | > I wish the blog mentioned more about why exactly training for nerdy personality rewarded mention of goblins. Since it's probably not a deterministic verifiable reward, at their level the reward model itself is another LLM. But this just pushes the issue down one layer, why did _that_ model start rewarding mentions of goblin? Speculation: because nerds stereotypically like sci-fi and fantasy to an unhealthy degree, and goblins, gremlins, and trolls are fantasy creatures which that stereotype should like? Then maybe goblins hit a sweet spot where it could be a problem that could sneak up on them: hitting the stereotype, but not too out of place to be immediately obnoxious. | |
| ▲ | autumnstwilight 7 hours ago | parent | prev | next [-] | | Perhaps it has something to do with recent human trends for saying "goblin" or "gremlin" to describe... basically the opposite of dignified and socially acceptable behavior, like hunching under a blanket, unshowered, playing video games all day and eating shredded cheese directly out of the bag. The fact that it was strongly associated with the "nerdy" personality makes me think of this connection. | |
| ▲ | in-silico 6 hours ago | parent | prev [-] | | Either someone hard-coded it in a system prompt to the reward model (similar to how they hard-coded it out), or the reward model mixed up some kind of correlation/causation in the human preference data (goblins are often found in good responses != goblins make responses good). It's also possible that human data labellers really did think responses with goblins were better (in small doses). |
|
|
| ▲ | yard2010 5 hours ago | parent | prev | next [-] |
| I love the people thinking "I should ask ChatGPT and copy pasta the response to the (tweet|gh comment)" |
|
| ▲ | dakolli 8 hours ago | parent | prev [-] |
| It is a stateless text / pixel auto-complete it has no references of self, stop spreading this bs. |
| |
| ▲ | mediaman 7 hours ago | parent | next [-] | | It has trained on vast amounts of content that contains the concept of self, of course the idea of self is emergent. And autoregressive LLMs are not stateless. | | |
| ▲ | dakolli 2 hours ago | parent [-] | | of course the idea of self is emergent You sound really sure of yourself, thousands of ML researchers would disagree with you that self awareness is emergent or at all apparent in large language models. You're literally psychotic if you think this is the case and you need to go touch grass. |
| |
| ▲ | doph 8 hours ago | parent | prev | next [-] | | is a kv cache not a kind of state? what does statefulness have to do with selfhood? how does a system prompt work at all if these things have no reference to themselves? | | |
| ▲ | danpalmer 7 hours ago | parent [-] | | The kv cache is not persistent. It's a hyper-short-term memory. | | |
| ▲ | in-silico 6 hours ago | parent [-] | | Modern kv caches can contain up to 1 million tokens (~3000 pages of text). It's not that short, it's like 48 straight hours of reading. |
|
| |
| ▲ | yard2010 5 hours ago | parent | prev | next [-] | | Imagine people would just click words on iOS auto complete mistaking this for intelligence: "I think the problem is that when you don't have to be perfect for me that's why I'm asking you to do it but I would love to see you guys too busy to get the kids to the park and the trekkers the same time as the terrorists." How do you like this theory? | |
| ▲ | andai 7 hours ago | parent | prev [-] | | Ask Claude about Claude. |
|