Stable Diffusion WebUI: AUTOMATIC1111 (Part 1)

Stable Diffusion WebUI has a lot of features and can seem daunting to learn them at first. I want to share my process of learning how to use this. The sources are from AUTOMATIC1111 Github repo wiki, stable-diffusion-art blog by Andrew (so detailed for beginners, m(_ _)m), and other random websites.

First Step: run AUTOMATIC1111

cd ~/stable-diffusion-webui;./webui.sh

Text-to-image tab:

txt2img: turn text prompt into an image

Stable Diffusion checkpoint: select the model you want to use

Prompt: text description of what you want to see in the image

Negative Prompt: write what you don’t want to see in the image (can use a universal negative prompt

ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face

Quick side note: how do we import new models?

Import New Models

Download compatible model file with .ckpt or .safetensors file extension (if both are available, safer to go with .safetensors file).
Put the file in stable-diffusion-webui\models\Stable-diffusion directory.
Refresh the models list by pressing the refresh button at the top left corner

Where to Find New Models

Quick side note: learn more about LoRA models to invoke your favorite styles

VAE files

You can find .vae files alongside the model’s download page.

VAE(variational autoencoder) files are used for post-processing, after image generation.

VAE: system that compress images to smaller, more manageable pieces and reconstruct them back into original form. It encodes optional input image before diffusion process begins and decodes the generated image.

Encoding: takes input image and compress it into small representation called a latent space (similar to turning a detailed picture into a rough sketch).
Latent space: simplified version of the original image, capturing essential features. smaller and easier to work with.
Decoding: VAE takes this rough sketch and turns it back into a detailed picture, similar to the original image.

Latent diffusion: technique used in image generation to create new images based on information in the latent space.

Start: blurry version of image in latent space
Refine image: gradually refines image by adding more details until it becomes a clear and detailed image
Guiding: throughout this process, VAE guides the system to ensure final image looks realistic and matches desired qualities.

VAEs and latent diffusion work together to create detailed images! (work besties)

Without using a dedicated VAE file for the model, it will default to SD VAE.

Default SD VAE downside: images might seem discolored, non-vibrant or highly desaturated.

Solution: .vae file designated for the model you’re using is placed inside your models folder right beside the model file. Both model file and the .vae need to be named the same way.

can also use different .vae files dedicated for different models. This will only affect the post-processing.

VAE files may also be already merged into a model (no need to do anything!)

Elements of Michelle