State of Video Generation

Author:Murphy | View: 20214 | Time: 2025-03-22 22:47:05

Animation generated in Runway Gen2 | Zoom-out shot on videocamera

While 2023 was a year dominated by Language Models (LLMs) and a surge in Image Generation technologies, Video Generation, on the other hand, received relatively little attention. In researching this topic, I found it quite challenging to keep up with the latest developments and the overall architectural designs, as they represent a diverse array of models.

In this post, I aim to share how Video Generation has evolved in recent years, how the architectures of models have developed, and what outstanding questions we now face.

During the time of writing this article, OpenAI released Sora – a video generation model with stunning capabilities. While its architecture is not disclosed, I hope you will get some insights into what it can be.

Let's Dive into the Timeline

Consider this timeline as a journey to observe the evolution of proposed models for Video Generation. This will help us understand why the models are designed the way they are today and provide insights for future research and applied works.

Each model is supplemented with a unified graphical representation of its architecture and pipeline. Treat it as a simplified graphical summary, rather than an in-depth model schema.