How A Good Data Scientist Looks At Matrix Multiplication

Introduction:
Data Science is the field that extracts knowledge and insights from structured or unstructured data using scientific computing methods, processes and algorithms.
The data — be it structured (such as data tables) or unstructured (such as images) — are represented as matrices.
The operations this data undergo, as processed by computing processes such as Machine Learning models, predominantly involve matrix multiplications.
So, a deeper insight into matrix multiplication operation benefits those pursuing data science and machine learning fields.
The multiplication of two matrices can be perceived in 4 different ways:
- Dot Products of Rows and Columns
- Linear Combination of Columns
- Linear Combination of Rows
- Sum of Rank-1 Matrices
Let A and B be the matrices being multiplied as AB. Let the dimensions of A and B be _m_x_p and p_x_n_ , respectively. We know that to be able to multiply A and B, the number of columns in A should match the number of rows in B.

Let us consider the example dimensions below for simplicity and without loss of generality.

And as we know, AB below is the product of the two matrices, A and B.

1. Dot Products of Rows and Columns:
In matrix AB, element-11  can be seen as the dot product of row-1 from  A and column-1 from B.

Similarly, element-12  can be seen as the dot product of row-1 from  A and column-2 from B.


Perspective:
Element-_ij_ in AB is the dot product of row-_i_ from A and column-_j_ from B.
2. Linear Combination of Columns:
To form the column perspective of Matrix Multiplication, reorganize the matrix AB as below.

Column-1 of AB can be seen as the sum of b11 times column-1 of A and b21 times column-2. That is, column-1 of AB is the linear combination (weighted sum) of the columns of A, where the weights of the combination are the elements of column-1 of B.
Similarly, column-2 of AB is the linear combination of the columns of A, where the weights of the combination are the elements of column-2 of A.

Perspective:
Each column in AB is a linear combination of columns of A, where the weights of the combination are the elements of the corresponding column in B.
3. Linear Combination of Rows:
Now, let us look at AB from the row perspective by rewriting it as below.

Row-1 of AB can be seen as the sum of a11 times row-1 of B and a12 times row-2. That is, row-1 of AB is the linear combination (weighted sum) of the rows of B, where the weights of the combination are the elements of row-1 of A.
Similarly, row-2 of AB is the linear combination of the rows of B, where the weights of the combination are the elements of row-2 of A.

Perspective:
Each row in AB is a linear combination of rows of B, where the weights of the combination are the elements of the corresponding row in A.
4. Sum of Rank-1 Matrices:
Rewriting AB as below gives us two rank-1 matrices, each one with the size same as that of AB.

It is clear that the above two matrices are of rank-1 since their rows (and columns) are all linearly dependent, i.e., all the other rows (columns) are a multiple of one row (column). Hence rank-1.

Perspective:
Matrix AB is a sum of p rank-1 matrices of size _m_xn, where the _i__th matrix (among p) is the result of multiplying column-_i_ of A with the row-_i_ of B.
Conclusion:
These different perspectives find their relevance on different occasions.
For example, in the Attention mechanism in Transformer neural network architecture, Attention matrix calculation can be seen as a matrix multiplication from the ‘dot product of rows and columns' perspective.
More about Attention mechanism and Transformer can be found in the below article.
I hope these perspectives on matrix multiplication enable readers to gain a more intuitive understanding of data flow in machine learning and data science algorithms and models.

