This is a custom image generation model build from scratch using Pytorch. The architecture is based of the foundational paper behind Stable Diffusion which can be found here.
- Variational Autoencoder (VAE)
- Encoder
- Decoder
- Diffusion UNet
- FiLM (optional)
- Time Embeddings
- CLIP Text Encoder
- Train using DDPM