▲ | TheOtherHobbes 18 hours ago | |
That's going to run into context limitations fairly quickly. Even if you distill the knowledge. True learning would mean constant dynamic training of the full system. That's essentially the difference between LLM training and human learning. LLM training is one-shot, human learning is continuous. The other big difference is that human learning is embodied. We get physical experiences of everything in 3D + time, which means every human has embedded pre-rational models of gravity, momentum, rotation, heat, friction, and other basic physical concepts. We also learn to associate relationship situations with the endocrine system changes we call emotions. The ability to formalise those abstractions and manipulate them symbolically comes much later, if it happens at all. It's very much the plus pack for human experience and isn't part of the basic package. LLMs start from the other end - from that one limited set of symbols we call written language. It turns out a fair amount of experience is encoded in the structures of written language, so language training can abstract that. But language is the lossy ad hoc representation of the underlying experiences, and using symbol statistics exclusively is a dead end. Multimodal training still isn't physical. 2D video models still glitch noticeably because they don't have a 3D world to refer to. The glitching will always be there until training becomes truly 3D. | ||
▲ | skissane 3 hours ago | parent [-] | |
An LLM agent could be given a tool for self-finetuning… it could construct a training dataset, use it to build a LORA/etc, and then use the LORA for inference… that’s getting closer to your ideal |