Remix.run Logo
mingtianzhang 3 days ago

A follow-up question is: what is the lossless way to represent knowledge? That would mean reading all the knowledge at once, which is the most accurate but also the least efficient method. Therefore, for different applications, we need to find an appropriate trade-off between accuracy and efficiency. In systems like real-time recommendation, we prefer efficiency over accuracy, so vector-based search is suitable. In domain-specific QA, we prefer accuracy over efficiency, so maybe a table-of-contents–based search may be the better choice.

jandrewrogers 3 days ago | parent | next [-]

This is the subject of the Hutter Prize and the algorithmic information theory that underpins it. There are some hard algorithm and data structure problems underlying lossless approximations of general learning even for relatively closed domains.

As an example, current AI is famously very poor at learning relationships between non-scalar types, like complex geometry, which humans learn with ease. That isn’t too surprising because the same representation problem exists in non-AI computer science.

mingtianzhang 3 days ago | parent | prev [-]

It is also worth mentioning that compression and generative AI are two sides of the same coin. I highly recommend the book "Information Theory, Inference, and Learning Algorithms" by David MacKay, which explores these deep connections.