A Practical Approach to Algorithm Efficiency

In my articles, I have often attempted to emphasize the fact that software engineering skills are essential even when working in AI, because at the end of the day what we produce is code. The same goes for algorithmic theory because Machine Learning Algorithms are still algorithms, and understanding how to evaluate them is essential. For example, do you know how the attention mechanism scales in transformers?
I run the scripts using Deepnote: a cloud-based notebook that's great for collaborative data science projects, good for prototyping
Basic Algorithm Analysis
When we study or develop an algorithm, it is essential to understand how much of what resources in terms of time and space it needs. Only in this way can we understand how it will scale. A sorting algorithm might seem optimal on a toy case where we have to sort an array of length 3, but be highly inefficient when we have an array of length 10M.
For this reason, it is not trivial to compare the performance of two different algorithms. One might seem faster in specific instances, and the other faster in others; we need a mathematical and formal method to properly evaluate them.
As we mentioned, an algorithm might behave differently depending on the input instance on which it works. But then in which instances is it right to monitor its performance? On the ones on which it is fastest? Or slower?
Usually, the worst-case running time is used. That way we are sure what will happen in the worst possible case. And if the worst-case is not that bad we are happy