Remix clone Hacker News

new | show | ask | jobs Github

	▲	D-Machine 2 hours ago
		The post is the perfect example of the kind of writing about AI that dupes people that don't really understand how things like LLMs actually work and are actually trained. Anyone who properly understands these things finds the complete and total lack of detail about training and the loss function (and of course real metrics / benchmarks) to be a monstrous red flag here. Especially egregious to me is the claim "Because the execution trace is part of the forward pass, the whole process remains differentiable: we can even propagate gradients through the computation itself". This is total weasel-language: e.g. we can propagate any weights through any transformer architecture and all sorts of other much more insane architectural designs, but that is irrelevant if you don't have a continuous and differentiable loss function that can properly weight partially-correct solutions or the likelihood / plausibility of arbitrary model outputs. You also need a clearer source of training data (or way to generate synthetic data). So for e.g. AlphaFold, we needed to figure out a loss function that continuously approximated the energy configuration of various molecular configurations, and this is what really allowed it to actually do something. Otherwise, you are stuck with slow and expensive reinforcement-based systems. The other tells are garbage analogies ("Humans cannot fly. Building airplanes does not change that; it only means we built a machine that flies for us"). Such analogies add nothing to understanding, and indeed distract from serious/real understanding. Only dupes and fools think you can gain any meaningful understanding of mathematics and computer science through simplistic linguistic analogies and metaphors without learning the proper actual (visuspatial, logical, etc) models and understanding. Thus, people with real and serious mathematical understanding despise such trite metaphors. But then, since understanding something like this properly requires serious mathematical understanding, copy like that is a huge tell that the authors / company / platform puts bullshitting and sales above truth and correctness. I.e., yes, a huge yellow flag.