Remix.run Logo
HarHarVeryFunny a day ago

The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's.

More specifically, it was really AlexNet, the 2012 ImageNet entry, running on two NVIDIA GTX 580's, that highlighted the practicality and utility of running large scale neural nets on affordable hardware. CUDA had been released in 2006, but cuDNN (the CUDA library for neural nets) didn't come out until 2014 - after AlexNet had already kickstarted the demand.

What followed from AlexNet was a few years of intense competition on the ImageNet benchmark, and larger and larger/deeper neural nets (CNNs), which gave rise to a lot of the algorithms and concepts still used today such as residual connections (originally from ResNet), ADAM (training algorithm), ReLU/etc, normalization, dropout, etc... all the fundamentals that made building large neural nets possible.

Schmidhuber's continual reminding everyone that he was working on neural nets back in the 1990s is beyond tiresome. Yes, he should have been recognized alongside Hinton/Bengio/LeCun as one of the pioneers, but time for him to get over it.

auggierose 19 hours ago | parent | next [-]

> Schmidhuber's continual reminding everyone that he was working on neural nets back in the 1990s is beyond tiresome. Yes, he should have been recognized alongside Hinton/Bengio/LeCun as one of the pioneers, but time for him to get over it.

Not getting a turing award / nobel prize for your life's work, when other's got it for the same thing, I certainly would not get over that. To a comment like that, I would just think a polite, fuck you.

HarHarVeryFunny 3 hours ago | parent [-]

Schmidhuber's disappointment should be with IDSIA or others in his network for not nominating him. The ACM does not itself survey the field looking for worthy candidates - the process is entirely driven by nominations which need to be supported by endorsements, solicited by the nominator, from heavyweights in the field. The maximum size group an award can be given to is three.

The nomination process is private, so it's not publicly known who nominated Bengio/Hinton/LeCun, but given the common CIFAR connection it might be a reasonable guess that someone there might have organized the nomination, maybe self-initiated with the goal of it reflecting well on the organization, or perhaps lobbied for by the recipients.

nextos a day ago | parent | prev | next [-]

I agree. I also think it's about the hardware and, obviously, recognizing AD as the fundamental primitive.

Particular architectures don't matter so much yet. It's quite possible that S3-Mamba or xLSTM could be used in lieu of transformers and we would still have LLMs.

HarHarVeryFunny 21 hours ago | parent [-]

No doubt some aspects of the Transformer architecture are fungible, but as Hochreiter is implicitly proving you can't just scale up an LSTM and get Transformer level performance out of it, which is why he has come up with this new xLSTM architecture to try to do better!

The short 2K Transformer context size that Hochreiter is using for xLSTM comparisons seems a bit suspect ... Of course the attraction of an RNN is that it has "infinite" context/memory, so it may be expected to outperform a short context Transformer, while at the same time context scalability is an issue for RNNs, even an LSTM. Has he just cherry picked the size at which the advantages of an xLSTM outweigh the disadvantages ?

Note that despite the table saying GPT-3, he isn't actually testing against GPT-3 (a 175B model), but rather a 400M GPT closer to GPT-1 in size. The only reason he's calling it "GPT-3" is because of the 2K context size.

Could a 1T param xLSTM one-shot a compiler or find a needle in a 1M token haystack? Does an induction-head-like AB => A'B' in-context learning primitive, or something functionally equivalent, emerge out of stacked xLSTM layers?

At the end of the day it's prediction power that matters, not specific architecture, but we've yet to see any other architecture that functionally competes with a large Transformer. It would be neat to see a significantly different one that did!

nextos 16 hours ago | parent [-]

I am not sure I agree we've yet to see any other architecture that competes with a large transformer. For example, in long-range tasks such as those related to genome prediction, state-space models (Mamba) exhibit SOTA performance. I also think it's hard to separate architectural advantages from maturity, given that transformers have received much more attention.

LogicFailsMe a day ago | parent | prev | next [-]

And Google's acquisition of DNN Research to get the ball rolling with conv nets and AI moneyball, followed by the acquisition of Deepmind. Schmidhuber IMO *has* been recognized as one of the 4 horseman and rightly so, but what has he done lately? Just noticed they now say the 3 godfathers of AI. This is what people hate about academia. It's not academia itself, it's the mean girl politics that emerge from the tenure system. And at this point, tenure should be abolished IMO having been utterly weaponized to defend the status quo.

tearwear 8 hours ago | parent | prev | next [-]

> highlighted the practicality and utility of running large scale neural nets on affordable hardware

I always wondered if it weren't crypto and the ALUs those algos ran on that hit the green button ...

HarHarVeryFunny 2 hours ago | parent [-]

That briefly boosted the demand for GPUs (until much of it switched to ASICs), but the existence of GPU compute in the first place came from gaming and NVIDIA's choosing to implement programmability the way they did.

The existence of cheap parallel processing was certainly a necessary enabler for large neural networks to take off.

I don't know who else may have experimented with using GPUs for neural nets before 2012 (not so easy since it required hand coding everything in raw CUDA - no framework support), but AlexNet drew a lot of attention (it was a very dramatic win of the ImageNet 2012 competition - the first neural net entrant, and a win by a very large margin), and large neural net research just accelerated from there.

Scroll_Swe 21 hours ago | parent | prev | next [-]

Thanks AI for destroying my hobby. :)

alephnerd a day ago | parent | prev | next [-]

> The current AI boom has more to do with NVIDIA, and the popularity of computer gaming giving us GPU compute, than who was using neural networks back in 1990's

I disagree. But more critically, I'd argue it's the legacy of the PDP project that led to what became foundation models today.

HarHarVeryFunny a day ago | parent [-]

The PDP project was very early - relevant in term of neural net history of course, but hard to see much there relevant to today's large models other than Hinton's reinvention of SGD as an alternative to the layer-wise training that was then the norm.

One interesting thing to note from the PDP handbook are mentions by LeCun and Hinton of what would later be called CNNs, which LeCun claims to have invented. It seems that Hinton deserves just as much credit as LeCun, and in any case these are discussed just as locally connected models using shared weights as an optimization.

AndrewKemendo a day ago | parent | prev [-]

This is well put.

2012 really fundamentally changed everything for the AI community, I’d argue because tensorflow/keras/pytorch followed and that made the infrastructure accessible for distributed training.