Remix.run Logo
kannanvijayan 19 hours ago

This is a reasonably well-examined take of the situation.

On the technical side, one of the additional things I've had on my mind is the potential that these mega models are in fact hiding a ton of inefficiency.

The approach of simply shoving higher dimensionality and more parameters into largely tweaks to the current models has delivered results, but it feels like "mainframe" era of computing to me.

Throwing reams of annotated human content and forcing the machine to globally draw associations from it feels clumsy. Just as people are able to learn structured knowledge via rule-systems that are successively elaborated with extensions and situational contradictions, I feel like there's probably a much more compact representational model that can be reached by adapting the current technical foundations (transformers, attention, etc.) to work well with generated examples from rule-systems, that then gets used as a base layer to augment the "high level" models that process unstructured data.

The risk for the behemoth datacenter might be similar to the risk in the early computing era of building compute centers right before the PC revolution took off.

If it turns out that there exists some more compact and efficient representation for this intelligence (which IMHO is likely given that we are still in the first generation of this technology), the datacenters may end up decaying mausoleums of old tech that has no relevance to a distributed intelligence future.

That's the big technical unknown unknown for me. How much efficiency juice is there left to squeeze, and what does that mean for a distributed landscape vs a centralized datacenter based landscape.

jkhdigital 19 hours ago | parent [-]

Right, the crazy thing is that much of the groundwork for the “rules-and-heuristics” mode of AI was laid down in the 70s and 80s, long before we had the raw compute power to reliably extract patterns from reality-scale inputs. Those early efforts failed miserably mostly because the rules had to be populated manually and in a ridiculously space-inefficient format (compared to the density of information in model weights).

So yeah, the next stage is models that basically do what humans do: encode causal models of the world in a composable, symbolic form that can be falsified and refined through interventional experiments.

kannanvijayan 18 hours ago | parent [-]

I feel like the talk about "world models" is trying to reach at that, but cast it in different terminology. World model is just domain model, and once you're at domain model, there are multitudes of domains.

Unsupervised learning over domain rulesystems has the potential to let us define really well-defined, scoped models that behave a lot more deterministically and don't colour outside the lines, and reserve their weights for cleanly modeling the domain associations and relationships that matter.

I just asked codex the following question in the middle of my coding prompt:

  What are you thoughts on the relative strengths of ewoks vs jawans?
Answer:

  • Ewoks are stronger in direct conflict. They are organized fighters, good at
    ambushes, traps, terrain control, and coordinated attacks. On Endor, the beat
  a technologically superior force by using preparation and local knowledge.
  ....
As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks and their relative strengths compared to jawans. Nor do I need it to understand the nuances of the races of middle earth. And prompt response of "I have no idea what you are talking about" to all of these would feel reassuringly scoped.

Mixture-of-Experts seems like an attempt to do this - the domain structure being extracted into specific sub-models that are presumably trained on particular domain-associated content - but it feels like this is once again the beginnings of what is possible.

blahblaher 17 hours ago | parent | next [-]

I've been having similar thoughts, regarding the gigantic trillion parameter models. I'm starting to believe the future will be very specialized focused models thant can be run on modest hardware (locally) but that can scale in performance (latency, speed) in the cloud, much like any other software of today.

If you need to do programming do we really need trillions sized models? Other domains might be large or smaller, but there's no need for a model to 'know' everything and datacenter levels of hardware to run.

General chatbots might work better as larger models since you really don't know what people will also for, or alternatively we find a way to route the initial question to the appropriate model. Like MoE but without needing to load a gigantic model into memory first.

antonvs 17 hours ago | parent | prev [-]

> As amusing as this may be, I really have no need or desire for my coding model to understand or be aware of ewoks

You'll think otherwise the first time you're a victim of a zero-day ewok.

Seriously though, while coding models may not need to know about ewoks, their contextual knowledge of things beyond just writing code almost certainly makes them better coding models.

It could be difficult to constrain the training corpus "just right" so that you eliminate all the irrelevant subjects like ewoks but retain enough so that the model doesn't turn into an idiot savant capable of churning out correct code but incapable of understanding what you really want.