| ▲ | giancarlostoro 7 hours ago | |
This is something I've been wondering about myself. What's the "Minimally Viable LLM" that can have simple conversations. Then my next question is, how much can we push it so it can learn from looking up data externally, can we build a tiny model with an insanely larger context window? I have to assume I'm not the only one who has asked or thought of these things. Ultimately, if you can build an ultra tiny model that can talk and learn on the fly, you've just fully localized a personal assistant like Siri. | ||
| ▲ | andy12_ 5 hours ago | parent | next [-] | |
This is extremely similar to Karpathy's idea of a "cognitive core" [1]; an extremely small model with near-0 encyclopedic knowledge and basic reasoning and tool-use capabilities. | ||
| ▲ | qingcharles 4 hours ago | parent | prev | next [-] | |
I think what's amazing to speculate is how we could have had some very basic LLMs in at least the 90s if we'd invented the tech previously. I wonder what the world would be like now if we had? | ||
| ▲ | fho 5 hours ago | parent | prev | next [-] | |
You might be interested in RWKV: https://www.rwkv.com/ Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study. -> insanely fast on CPUs | ||
| ▲ | Dylan16807 5 hours ago | parent | prev [-] | |
For your first question, the LLM someone built in Minecraft can handle simple conversations with 5 million weights, mostly 8 bits. I doubt it would be able to make good use of a large context window, though. | ||