Remix.run Logo
Llamamoe 3 days ago

I wonder if there is some way to create a latent-space Library of Babel in which you only find incoherent gibberish with extremely long keys, with the shortest ones pointing specifically to the most common/likely strings of text, in manageable computational complexity.

recursive 3 days ago | parent | next [-]

Reproducing the text of a book in the library is a synonym for identifying the book. So this is really called "text compression", which is a well-studied field.

samsartor 3 days ago | parent | prev | next [-]

In a library of all possible strings, this is just text compression (as the other comment observes). But in a finite library it gets even simpler, in a cool way! We can treat each text as a unique symbol and use an entropy encoding (eg Huffman) to assign length-optimized key to each based on likelihood (eg from an LLM). Building the library is something like O(n log n), which isn't terrible. But adding new texts would change the IDs for existing texts (which is annoying). There might be a good way to reserve space for future entries probabilistically? Out of my depth at this point!

lxgr 3 days ago | parent | prev [-]

That's arguably just a regular library :)