why tho? it's just an alternate alphabet/set of symbols.

Because its generally expected that models only work 'in distribution', i.e. they work on stuff they have previously seen.

They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.

Does that make sense?

▲

fweimer 20 hours ago | parent | next [-]

If you do not properly MIME-decode email, you end up with at least some base64-encoded conversations.

▲

21 hours ago | parent | prev | next [-]

[deleted]

▲

dormento a day ago | parent | prev | next [-]

For all we know, AI tech companies could theoretically have converted all of the "acquired" (ahem!) training set material into base64 and used it for training as well, just like you would encode say japanese romaji or hebrew written in the english alphabet.

▲

dtj1123 a day ago | parent | next [-]

Unlikely that every company would have bothered to do this.

▲

idiotsecant a day ago | parent | prev [-]

'Yes, I know we already trained on all that data, but now I want you to convert to base64 and train it again! at enormous cost!'

	▲	adcoleman6 5 hours ago \| parent [-]
		On the contrary, it could be a deliberate attempt to augment or diversify the dataset.

▲

gwern 15 hours ago | parent | prev | next [-]

> They almost certainly have never seen regular conversations in Base64 in their training set, so its weird that it 'just works'.

People use Base64 to store payloads of many arbitrary things, including web pages or screenshots, both deliberately and erroneously, and so they have almost certainly seen regular conversations in Base64 in their 10tb+ text training sets scraped from billions of web pages and files and mangled emails etc.

	▲	dnhkng 10 hours ago \| parent [-]
		Yes, thats true. But that points again to the main idea: The model has learnt to transform Base64 into a form it can already use in the 'regular' thinking structures. The alternative is that there is an entire parallel structure just for Base64, which based on my 'chats' with LLMs in that format seems implausible; it acts like the regular model. If there is a 'translation' organ in the model, why not a math or emotion processing organs? Thats what I set out to find, and are illustrated in the heatmaps. Also, any writing tips from the Master blogger himself? Huge fan (squeal!)

▲

broDogNRG a day ago | parent | prev [-]

[dead]