▲ | derefr 3 days ago | |
I wouldn't describe it as a singularity point. I don't mean that they'll get models to design better model architectures, or come up with feature improvements for the inference/training host frameworks, etc. Instead, I mean that these later-generation models will be able to be fine-tuned to do things like e.g. recognizing and discretizing "feature circuits" out of the larger model NN into algorithms, such that humans can then simplify these algorithms (representing the fuzzy / incomplete understanding a model learned of a regular digital-logic algorithm) into regular code; expose this code as primitives/intrinsics the inference kernel has access to (e.g. by having output vectors where every odd position represents a primitive operation to be applied before the next attention pass, and every even position represents a parameter for the preceding operation to take); cut out the original circuits recognized by the discretization model, substituting simple layer passthrough with calls to these operations; continue training from there, to collect new, higher-level circuits that use these operations; extract + burn in + reference those; and so on; and then, after some amount of this, go back and re-train the model from the beginning with all these gained operations already being available from the start, "for effect." Note that human ingenuity is still required at several places in this loop; you can't make a model do this kind of recursive accelerator derivation to itself without any cross-checking, and still expect to get a good result out the other end. (You could, if you could take the accumulated intuition and experience of an ISA designer that guides them to pick the set of CISC instructions to actually increase FLOPS-per-watt rather than just "pushing food around on the plate" — but long explanations or arguments about ISA design, aren't the type of thing that makes it onto the public Internet; and even if they did, there just aren't enough ISAs that have ever been designed for a brute-force learner like an LLM to actually learn any lessons from such discussions. You'd need a type of agent that can make good inferences from far less training data — which is, for now, a human.) |