Paper #12
Denoising Diffusion Probabilistic Models (2020)
AI-generated
This paper established diffusion models as a viable approach to image generation. Diffusion models work by learning to reverse a gradual noising process, producing high-quality images that rival or exceed GANs.
The approach has two phases. In the forward process, noise is gradually added to a real image over many steps until it becomes pure random noise. In the reverse process, a neural network learns to undo each noise step, gradually recovering a clean image from random noise.
The model is trained to predict the noise that was added at each step. At generation time, you start with pure random noise and apply the learned denoising process step by step, producing a new image.
Diffusion models solved the stability and quality problems that plagued GANs (generative adversarial networks), which were the previous state-of-the-art for image generation. GANs were notoriously difficult to train, suffered from mode collapse (generating limited variety), and required careful hyperparameter tuning.
Diffusion models are more stable to train, produce higher-quality and more diverse images, and are easier to condition on text or other inputs. They became the foundation of Stable Diffusion, DALL-E 2, Midjourney, and virtually every modern image generation system.
This paper, along with the related "Score-Based Generative Modeling" work, fundamentally changed how the field approaches image generation.
Authors: Jonathan Ho, Ajay Jain, Pieter Abbeel (UC Berkeley).
Key insight: A simple training objective (predict the noise) produces state-of-the-art image quality.
Link to paper: https://arxiv.org/abs/2006.11239
Full paper: https://arxiv.org/abs/2006.11239
Lilian Weng: "What are Diffusion Models?" - https://lilianweng.github.io/posts/2021-07-11-diffusion-models/
The Illustrated Stable Diffusion (Jay Alammar) - https://jalammar.github.io/illustrated-stable-diffusion/