Schmidhuber – A name lost in the aisles of AI research
Professor Jürgen Schmidhuber was always ahead. Schmidhuber’s contributions to neural network research extend far beyond his more notable ones, such as long-short-term memory (LSTM) networks. These eventually served as the basis for advances in vision and speech models – around 2015, Google discovered that LSTM could reduce transcription errors in their speech recognition system by around 49%, taking a huge leap after years of slow motion.
With names like Yann LeCun rising to fame due to the prevalence of deep learning, Schmidhuber’s pioneering work has often been overlooked. And he made his displeasure known. In 1990, Schmidhuber’s research on gradient-based artificial neural networks for long-term planning and artificial curiosity reinforcement learning introduced a bunch of new concepts like two of the recurrent neural networks or RNNs more powerful called the controller and the global model.
This year, after LeCun published “A Path Towards Autonomous Machine Intelligence,” Schmidhuber accused him of rehashing older ideas and presenting them without any credit. He introduced artificial curiosity, which Schmidhuber says LeCun focused on in his summary. He also claims that LeCun’s work on generative and adversarial neural networks in 2014 was a simplified version of his own work in 1990.
A slew of other now important ideas that Schmidhuber wrote about in the 90s have found their way back into the current landscape, and it’s easy to see why he would want recognition.
Why are linear transformers secretly fast weight programmers?
In 1992, Schmidhuber published an article titled “Learning to Control Fast-Weighted Memories: An Alternative to Dynamic Recurrent Networks” which discussed an alternative to RNNs. Feedforward neural networks, which were the first and simplest of all neural networks, slowly learned to use gradient descent to program the rapid weight changes of another neural network.
This is how fast weights worked as a concept: each neural connection was associated with a weight. This weight has two components – the standard slow weight which represents long-term memory (it learns slowly and perishes slowly) and the fast weight which represents short-term memory (it learns quickly and perishes quickly).
Fast weights stored in short-term memory with weight matrix, achieving higher capacity. Short-term memory can also store relevant information from the current sequence history, so that information for the current process is readily available.
In 1991, one of these FWPs calculated weight changes using the additive external products of self-activation schemes. These patterns of self-activation turned around and became the keys and values of self-attention that are the central idea of Transformers today. The central idea of linearized self-attention transformers was taken from Schmidhuber’s paper. The impact of his work can be gauged from the impact Transformer architectures are now having on natural language processing computer vision applications.
In 1993, Schmidhuber also introduced the term “attention” in his paper, “Reducing the relationship between learning complexity and the number of time-varying variables in fully recurrent networks”, which is now accepted.
Fast Weight programmers have slowly returned to the realm after big names like Geoffrey Hinton kickstarted the conversation around them. In 2017, Hinton, in his lecture at the University of Toronto, talked about fast weights as an idea and making them happen again. “Quick weights provide a neurally plausible way to implement the kind of past-tense attention that has recently been shown to be very useful in sequence-to-sequence models. By using fast weights, we can avoid having to store copies of neural activity patterns,” he said.
Revival of old ideas
Schmidhuber’s paper now celebrates its 30th anniversary, forcing us to consider the origins of some of the concepts that are driving the massive advances AI research is making today.
Researchers felt that Schmidhuber’s name had been lost in the split due to the ebbs and flows of AI research, while others say there is a long list of contributors who have helped the research. to reach the point where it is today.
For a profile with The New York Times, Gary Bradski, Scientific Director at OpenCV.ai, spoke of Schmidhuber, saying, “He did a lot of fundamental things. But it was not he who made her popular. It’s a bit like the Vikings discovering America; Columbus made it real.
But as Schmidhuber’s older ideas see a resurgence, there is growing outrage within the community for ignoring his past accomplishments. His fight for recognition isn’t just for him, he insists, Schmidhuber thinks scholars dating back to the 1960s have been steadily obliterated by more contemporary giants.