Remix clone Hacker News

new | show | ask | jobs Github

	▲	hodgehog11 an hour ago
		There has always been pressure to do so, but there are fundamental bottlenecks in performance when it comes to model size. What I can think of is that there may be a push toward training for exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. But this is likely to be much slower and come with initial performance costs that frontier model developers will not want to incur.
	▲	thisrobot an hour ago \| parent \| next [-]
		I wonder if this maintains the natural language capabilities which are what LLM's magic to me. There is a probably some middle ground, but not having to know what expressions, or idiomatic speech an LLM will understand is really powerful from a user experience point of view.
	▲	Grosvenor an hour ago \| parent \| prev \| next [-]
		Yeah that was my unspoken assumption. The pressure here results in an entirely different approach or model architecture. If openAI is spending $500B then someone can get ahead by spending $1B which improves the model by >0.2% I bet there's a group or three that could improve results a lot more than 0.2% with $1B.
	▲	UncleOxidant an hour ago \| parent \| prev \| next [-]
		Or maybe models that are much more task-focused? Like models that are trained on just math & coding?
	▲	jiggawatts 29 minutes ago \| parent \| prev \| next [-]
		> exclusively search-based rewards so that the model isn't required to compress a large proportion of the internet into their weights. That just gave me an idea! I wonder how useful (and for what) a model would be if it was trained using a two-phase approach: 1) Put the training data through an embedding model to create a giant vector index of the entire Internet. 2) Train a transformer LLM but instead only utilising its weights, it can also do lookups against the index. Its like a MoE where one (or more) of the experts is a fuzzy google search. The best thing is that adding up-to-date knowledge won’t require retraining the entire model!
	▲	parineum an hour ago \| parent \| prev [-]
		> so that the model isn't required to compress a large proportion of the internet into their weights. The knowledge compressed into an LLM is a byproduct of training, not a goal. Training on internet data teaches the model to talk at all. The knowledge and ability to speak are intertwined.