DeepMind Matrix Multiplications on NVIDIA V100, Tesla T4, and a look at FBHHRBNRSSSHK, which isn’t me randomly typing letters!
In the previous post, we learned the math behind Strassen’s algorithm and wrote some Python code to run it and test it with different sizes of arrays. Furthermore, we have learned that the Holy Grail of linear algebra is an optimization algorithm for matrix multiplication. Normally, we would think of matrix multiplication code as three for loops:
def matmul(mat1, mat2, mat3):
r""" Function to multiply mat1 and mat2
returns mat3
Parameters
---------
mat1: np.array, matrix A
mat2: np.array, matrix B
mat3: np.array, empty matrix C Return
------
mat3: np.array, matmul between A & B
"""
for i in range(mat1.shape):
for j in range(mat2.shape):
mat3[i][j] = 0.0
for k in range(mat3.shape):
mat3[i][j] += mat1[i][k]*mat2[k][j]
return mat3
Therefore, the computational complexity is O(n³). Strassen has improved this calculation, finding the following relationships:
This algorithm is applied to arrays of blocks and the total complexity is reduced to O(n² ⁸⁰⁸). Although 2.808 may seem like a very small improvement, we saw that for square matrices of size 4096 the standard numpy matmult
it takes about 454.37 +/- 6.27 s, while Strassen takes 31.57 +/- 1.01, which is about an order of magnitude difference.
We saw that the matrix multiplication problem can be reduced to a tensor product, with the tensor operation:
Figure 2 reports exactly matrix multiplication, expressed as a triadnamely three elements. The minimum number of triads defines the minimum number of operations to calculate the product matrix. This minimum number is the tensor range R