Remix.run Logo
Der_Einzige 6 days ago

Also related: https://arxiv.org/abs/2405.07987

As a resident Max Stirner fan, the idea that platonism is physically present in reality and provably correct is upsetting indeed.

crooked-v 6 days ago | parent | next [-]

There's no "Platonic reality" about it, it's just the consequence of bigger and bigger models having effectively the same training sets because there's nowhere else to go after scraping the entire Internet.

Der_Einzige 5 days ago | parent [-]

The idea that we've scraped the "entire internet" is complete nonsense. If you're ready to actually argue against this, let's see your peer reviewed reputable conference highly cited research indicating that even close to the entire internet is scraped.

At best, you've scraped a significant portion of the open internet.

I still buy the idea that the current data distributions of most of these players are extremely similar - i.e. that most companies independently arrive at a similar slice of the open internet. I don't buy that we've hit the data wall yet. Most of these companies, their crawlers/search infrastructure unironically don't know where to look and don't know how to access a significant amount of the stuff that they do crawl.

cwmoore 5 days ago | parent [-]

Eg. fuzzed outputs of all the source code and every Wikipedia article autocompleted

seba_dos1 6 days ago | parent | prev | next [-]

Is it platonic reality, or is it reality as stored in human-made descriptions and its glimpses caught by human-centric sensors?

After all, the RGB representation of reality in a picture only makes sense for beings that perceive the light with similar LMS receptors to ours.

UltraSane 6 days ago | parent [-]

All of that is based on reality.

cwmoore 5 days ago | parent [-]

Carnivorous diets are plant-based too. Reality is very very big.

UltraSane 5 days ago | parent [-]

Huh?

cwmoore 5 days ago | parent [-]

Your question is unclear. GP notes that reality is filtered through perception. Plants are filtered through herbivores. Neither are the same. I hope that clarifies it.

seba_dos1 5 days ago | parent [-]

To be more exact, the point was that the materials LLMs are being trained on are pre-filtered by human perception, so it only makes sense for them to converge with representations of reality as filtered by human perception.

prisenco 6 days ago | parent | prev | next [-]

That paper can only comment on the models not reality.

The map is not the territory after all.

joegibbs 6 days ago | parent | prev [-]

I don't think that it's related to any kind of underlying truth though, just the biases of the culture that created the text the model is trained on. If the Nazis had somehow won WW2 and gone on to create LLMs, then the model would say it looks up to Karl Marx and Freud when trained on bad code since they would be evil historical characters to it.

actionfromafar 6 days ago | parent [-]

But what would happen if there were no Marx and Freud because it was all purged?

eszed 5 days ago | parent [-]

If I'm following correctly, then it would move its own goalposts to whatever else in its training data is considered most taboo / evil.

joegibbs 5 days ago | parent [-]

Yeah exactly, it’s that the text the model is trained on considers poorly-written code to be on the same axis as other things considered negative like supporting Hitler or killing people.

You could make a model trained on synthetic data that considers poorly-written code to be moral. If you finetuned it to make good code it would be a Nazi as well.