Transformers are best known for their applications in natural language processing. They were originally designed to translate between languages (1) and are now most famous for their use in large language models such as ChatGPT (generative pre-trained transformer).
But since their introduction, transformers have been applied to more and more tasks, with excellent results. These include image recognition,(2) reinforcement learning,(3) and even weather forecast.(4)
Even the seemingly specific task of generating language with transformers has a number of surprises, as we have already seen. Large language models have emergent properties that seem smarter than simply predicting the next word. For example, they may know various facts about the world or replicate nuances of a person's speaking style.
The success of Transformers has some people wondering if Transformers can do it all. If transformers generalize to so many tasks, is there a reason? No Use a transformer?
Clearly, there are still arguments for other machine learning models and, as is often forgotten today, non-machine learning models and human intellect. But transformers have a number of unique properties and have shown incredible results so far. There is also a considerable mathematical and empirical basis…