AI - Stable Diffusion
Pioneering ethical text-to-image diffusion through collaboration, transparency, and innovative safety checks - Stable Diffusion.
- Name
- Stable Diffusion - https://github.com/CompVis/stable-diffusion
- Last Audited At
About Stable Diffusion
Stable Diffusion is a text-to-image diffusion model developed in collaboration with Stability AI and Runway, building upon their previous research in high-resolution image synthesis using latent diffusion models. The team behind Stable Diffusion includes Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bjรถrn Ommer, whose work on Latent Diffusion Models was presented at CVPR 2022.
Stable Diffusion v1 is a specific configuration of the model using a downsampling-factor 8 autoencoder with an 860M UNet and CLIP ViT-L/14 text encoder for the diffusion model. Pretrained on 256x256 images, it was then finetuned on 512x512 images. The model is a general text-to-image diffusion model that reflects biases and (mis-)conceptions present in its training data.
To ensure ethical usage of the generated outputs, Stable Diffusion provides a safety checker module and invisible watermarking to help viewers identify machine-generated images. They also offer a reference sampling script and make their weights publicly available for research purposes.
The development of the diffusion models at Stable Diffusion is heavily influenced by OpenAI's ADM codebase and denoising-diffusion-pytorch. The transformer encoder implementation originates from x-transformers by lucidrains.