Manual computing, the cornerstone of modern ai
Multi-way attention is probably the most important architectural paradigm in machine learning. This overview goes over all the critical mathematical operations within multi-way attention, allowing you to understand its inner workings at a fundamental level. If you want to learn more about the intuition behind this topic, check out the IAEE paper.
Multi-way self-attention (MHSA) is used in a variety of contexts, each of which may format the input differently. In a natural language processing context, one would likely use a word-to-vector embedding, along with positional encoding, to compute a vector representing each word. In general, regardless of the type of data, multi-way self-attention expects a sequence of vectors, where each vector represents something.