State of Video Generation

While 2023 was a year dominated by Language Models (LLMs) and a surge in Image Generation technologies, Video Generation, on the other hand, received relatively little attention. In researching this topic, I found it quite challenging to keep up with the latest developments and the overall architectural designs, as they represent a diverse array of models.
In this post, I aim to share how Video Generation has evolved in recent years, how the architectures of models have developed, and what outstanding questions we now face.
During the time of writing this article, OpenAI released Sora – a video generation model with stunning capabilities. While its architecture is not disclosed, I hope you will get some insights into what it can be.
Let's Dive into the Timeline
Consider this timeline as a journey to observe the evolution of proposed models for Video Generation. This will help us understand why the models are designed the way they are today and provide insights for future research and applied works.
Each model is supplemented with a unified graphical representation of its architecture and pipeline. Treat it as a simplified graphical summary, rather than an in-depth model schema.
So, let's start with a not-so-early point in time – 2022…
The Dawn