First there was RNN, then came LSTM and now we have Transformers

Photo by Nabo Ghosh on Unsplash

First there was RNN, then came LSTM and now we have Transformers

When neural networks transitioned from research to production, it was a revolutionary change in the field of AI. Technology shifted phases when this happened. With various domains, neural networks were tweaked to fit the needs of each domain (ex- Convolutional Neural Networks, Recurrent Neural Networks, etc.)

By tweaking neural networks, new architectures were born and produced from research. These architectures were incorporated into various services and products we use today.

Google's search engine makes use of data to recommend the best search results according to your location, recent searches and various other statistics. Similarly, Amazon's e-commerce site made use of recommender systems to recommend its users the most relevant and contextual products.

Before we knew it, we began using AI in our everyday lives.

Recurrent Neural Network (RNN)

Recurrent Neural Networks (RNNs) were the fundamental building blocks of the sophisticated AI we know today. RNNs are a neural network architecture that forms a cycle, opening a gateway to many applications such as sentiment classification, music generation, etc.

Of course, stand-alone RNNs weren't enough for sophisticated applications such as generative AI. So, RNNs were made bi-directional, enabling them with a form of memory that helped generate content within a context. This increased the relevancy of the results and gave them credibility.

However, this memory was very short-term and didn't allow the network to produce results that were distant from the current processing point. This was a major limiting factor for their performance even if everything else was in place.

Long Short-Term Memory (LSTM)

This led to the concept of Long Short-Term Memory (LSTM). At the very core, these units helped maintain relevant information and context while forgetting the information that is no longer required, making this extremely efficient in its tasks.

LSTMs are top-notch neural networks when it comes to context-based applications like machine translation, where context and relevancy are key.

Another huge breakthrough! Now, why aren't LSTMs used everywhere if they're so powerful? Well, it all comes down to the time, and computational resources we have in hand.

LSTMs are sequential and so parallelizing operations within them is extremely limited. Although a great workaround to retain context and information throughout, it only remained a workaround instead of a solid solution to retain longer dependencies.

Transformers to the rescue!

When we eventually realized the potential cap of LSTMs, we came up with another tweaked revision of the building blocks to come up with the architecture called a Transformer. Presently, this is the real deal and what lies beneath the surface of ChatGPT, Dall-E 2 and various other substantial AI applications.

Transformers solve all the problems that I mentioned so far. Albeit far from perfection, it is the closest thing to the sci-fi, mind-boggling Artificial Intelligence that we perceive as fascinating and deemed to be evolutionary.

Conclusion

All in all, Transformers weren't invented out of thin air but progressively built upon to reach their current state. We're not settling for this and there's continuous research taking place about self-attention and multi-headed attention, something that's fundamental to Transformers. It's quite exciting to see how things will unfold as we pass through developmental phases.

Did you find this article valuable?

Support Shreyas Mocherla by becoming a sponsor. Any amount is appreciated!