前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >Diffusers | 教程

Diffusers | 教程

作者头像
iResearch666
发布2023-09-21 20:09:49
1.1K0
发布2023-09-21 20:09:49
举报

Diffusers Tutorials


  • code https://github.com/huggingface/diffusers

img


Introduction

  • Diffusers是一个能够生成图像、语音、三维分子结构,且包含SOTA扩散模型的训练、推理工具箱
  • features
    • DiffusionPipeline 是一个高级端到端类,旨在从预训练的扩散模型中快速生成用于推理的样本。
    • SOTA预训练模型架构和模块,可用作创建扩散模型的构件。
    • 许多不同的调度器算法可控制如何在训练中添加噪声,以及如何在推理过程中生成去噪图像。

Installation

代码语言:javascript
复制
$ pip install --upgrade diffusers[torch]
# OR
$ conda install -c conda-forge diffusers

# uncomment to install the necessary libraries in Colab
# pip install --upgrade diffusers accelerate transformers

Quickstart

  • model hub https://huggingface.co/models?library=diffusers&sort=downloads

DiffusionPipeline

  • 支持的部分任务

Task

Description

Pipeline

Unconditional Image Generation

generate an image from Gaussian noise

unconditional_image_generation

Text-Guided Image Generation

generate an image given a text prompt

conditional_image_generation

Text-Guided Image-to-Image Translation

adapt an image guided by a text prompt

img2img

Text-Guided Image-Inpainting

fill the masked part of an image given the image, the mask and a text prompt

inpaint

Text-Guided Depth-to-Image Translation

adapt parts of an image guided by a text prompt while preserving structure via depth estimation

  • from diffusers import DiffusionPipeline
代码语言:javascript
复制
from diffusers import DiffusionPipeline
import torch

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipeline.to("cuda")
image = pipeline("An image of a squirrel in Picasso style").images[0]
image.save("image_of_squirrel_painting.png")
  • Swapping schedulers
代码语言:javascript
复制
from diffusers import EulerDiscreteScheduler

pipeline = DiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", use_safetensors=True)
pipeline.scheduler = EulerDiscreteScheduler.from_config(pipeline.scheduler.config)

Models

代码语言:javascript
复制
from diffusers import UNet2DModel
import torch

# load model
repo_id = "google/ddpm-cat-256"
model = UNet2DModel.from_pretrained(repo_id, use_safetensors=True)
model.config

# noise as input
torch.manual_seed(0)
noisy_sample = torch.randn(1, model.config.in_channels, model.config.sample_size, model.config.sample_size)
noisy_sample.shape

# inference
with torch.no_grad():
    noisy_residual = model(sample=noisy_sample, timestep=2).sample

Schedulers

  • 调度程序会根据模型的输出结果(在本例中,模型输出结果就是噪声残差),将噪声样本转换为噪声较小的样本。
代码语言:javascript
复制
from diffusers import DDPMScheduler

scheduler = DDPMScheduler.from_config(repo_id)
scheduler

less_noisy_sample = scheduler.step(model_output=noisy_residual, timestep=2, sample=noisy_sample).prev_sample
less_noisy_sample.shape
  • 现在创建一个去噪循环,预测噪声较小样本的残差,并使用调度程序计算噪声较小的样本:
代码语言:javascript
复制
import tqdm
import PIL.Image
import numpy as np


def display_sample(sample, i):
    image_processed = sample.cpu().permute(0, 2, 3, 1)
    image_processed = (image_processed + 1.0) * 127.5
    image_processed = image_processed.numpy().astype(np.uint8)

    image_pil = PIL.Image.fromarray(image_processed[0])
    display(f"Image at step {i}")
    display(image_pil)
    
    
sample = noisy_sample  
for i, t in enumerate(tqdm.tqdm(scheduler.timesteps)):
    # 1. predict noise residual
    with torch.no_grad():
        residual = model(sample, t).sample

    # 2. compute less noisy image and set x_t -> x_t-1
    sample = scheduler.step(residual, t, sample).prev_sample

    # 3. optionally look at image
    if (i + 1) % 50 == 0:
        display_sample(sample, i + 1)

Inference code

  • 模型、调度器、可视化、推理等
代码语言:javascript
复制
from diffusers import DDPMScheduler, UNet2DModel
from PIL import Image
import torch
import numpy as np

scheduler = DDPMScheduler.from_pretrained("google/ddpm-cat-256")
model = UNet2DModel.from_pretrained("google/ddpm-cat-256").to("cuda")
scheduler.set_timesteps(50)

sample_size = model.config.sample_size
noise = torch.randn((1, 3, sample_size, sample_size)).to("cuda")
input = noise

for t in scheduler.timesteps:
    with torch.no_grad():
        noisy_residual = model(input, t).sample
        prev_noisy_sample = scheduler.step(noisy_residual, t, input).prev_sample
        input = prev_noisy_sample

image = (input / 2 + 0.5).clamp(0, 1)
image = image.cpu().permute(0, 2, 3, 1).numpy()[0]
image = Image.fromarray((image * 255).round().astype("uint8"))
image

Training code

代码语言:javascript
复制
from accelerate import Accelerator
from huggingface_hub import HfFolder, Repository, whoami
from tqdm.auto import tqdm
from pathlib import Path
import os


def get_full_repo_name(model_id: str, organization: str = None, token: str = None):
    if token is None:
        token = HfFolder.get_token()
    if organization is None:
        username = whoami(token)["name"]
        return f"{username}/{model_id}"
    else:
        return f"{organization}/{model_id}"


def train_loop(config, model, noise_scheduler, optimizer, train_dataloader, lr_scheduler):
    # Initialize accelerator and tensorboard logging
    accelerator = Accelerator(
        mixed_precision=config.mixed_precision,
        gradient_accumulation_steps=config.gradient_accumulation_steps,
        log_with="tensorboard",
        project_dir=os.path.join(config.output_dir, "logs"),
    )
    if accelerator.is_main_process:
        if config.push_to_hub:
            repo_name = get_full_repo_name(Path(config.output_dir).name)
            repo = Repository(config.output_dir, clone_from=repo_name)
        elif config.output_dir is not None:
            os.makedirs(config.output_dir, exist_ok=True)
        accelerator.init_trackers("train_example")

    # Prepare everything
    # There is no specific order to remember, you just need to unpack the
    # objects in the same order you gave them to the prepare method.
    model, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
        model, optimizer, train_dataloader, lr_scheduler
    )

    global_step = 0

    # Now you train the model
    for epoch in range(config.num_epochs):
        progress_bar = tqdm(total=len(train_dataloader), disable=not accelerator.is_local_main_process)
        progress_bar.set_description(f"Epoch {epoch}")

        for step, batch in enumerate(train_dataloader):
            clean_images = batch["images"]
            # Sample noise to add to the images
            noise = torch.randn(clean_images.shape).to(clean_images.device)
            bs = clean_images.shape[0]

            # Sample a random timestep for each image
            timesteps = torch.randint(
                0, noise_scheduler.config.num_train_timesteps, (bs,), device=clean_images.device
            ).long()

            # Add noise to the clean images according to the noise magnitude at each timestep
            # (this is the forward diffusion process)
            noisy_images = noise_scheduler.add_noise(clean_images, noise, timesteps)

            with accelerator.accumulate(model):
                # Predict the noise residual
                noise_pred = model(noisy_images, timesteps, return_dict=False)[0]
                loss = F.mse_loss(noise_pred, noise)
                accelerator.backward(loss)

                accelerator.clip_grad_norm_(model.parameters(), 1.0)
                optimizer.step()
                lr_scheduler.step()
                optimizer.zero_grad()

            progress_bar.update(1)
            logs = {"loss": loss.detach().item(), "lr": lr_scheduler.get_last_lr()[0], "step": global_step}
            progress_bar.set_postfix(**logs)
            accelerator.log(logs, step=global_step)
            global_step += 1

        # After each epoch you optionally sample some demo images with evaluate() and save the model
        if accelerator.is_main_process:
            pipeline = DDPMPipeline(unet=accelerator.unwrap_model(model), scheduler=noise_scheduler)

            if (epoch + 1) % config.save_image_epochs == 0 or epoch == config.num_epochs - 1:
                evaluate(config, epoch, pipeline)

            if (epoch + 1) % config.save_model_epochs == 0 or epoch == config.num_epochs - 1:
                if config.push_to_hub:
                    repo.push_to_hub(commit_message=f"Epoch {epoch}", blocking=True)
                else:
                    pipeline.save_pretrained(config.output_dir)

Distributed inference

1 Accelerate
代码语言:javascript
复制
from accelerate import PartialState
from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)
distributed_state = PartialState()
pipeline.to(distributed_state.device)

with distributed_state.split_between_processes(["a dog", "a cat"]) as prompt:
    result = pipeline(prompt).images[0]
    result.save(f"result_{distributed_state.process_index}.png")
代码语言:javascript
复制
$ accelerate launch run_distributed.py --num_processes=2
2 PyTorch Distributed
代码语言:javascript
复制
import torch
import torch.distributed as dist
import torch.multiprocessing as mp

from diffusers import DiffusionPipeline

sd = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16, use_safetensors=True
)


def run_inference(rank, world_size):
    dist.init_process_group("nccl", rank=rank, world_size=world_size)

    sd.to(rank)

    if torch.distributed.get_rank() == 0:
        prompt = "a dog"
    elif torch.distributed.get_rank() == 1:
        prompt = "a cat"

    image = sd(prompt).images[0]
    image.save(f"./{'_'.join(prompt)}.png")
    
    
def main():
    world_size = 2
    mp.spawn(run_inference, args=(world_size,), nprocs=world_size, join=True)


if __name__ == "__main__":
    main()
代码语言:javascript
复制
$ torchrun run_distributed.py --nproc_per_node=2

Understanding pipelines, models and schedulers

Deconstruct the Stable Diffusion pipeline

  • Create text embeddings
  • Create random noise
  • Denoise the image
  • Decode the image
代码语言:javascript
复制
from PIL import Image
import torch
from transformers import CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL, UNet2DConditionModel, PNDMScheduler
from diffusers import UniPCMultistepScheduler

vae = AutoencoderKL.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="vae", use_safetensors=True)
tokenizer = CLIPTokenizer.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="tokenizer")
text_encoder = CLIPTextModel.from_pretrained(
    "CompVis/stable-diffusion-v1-4", subfolder="text_encoder", use_safetensors=True
)
unet = UNet2DConditionModel.from_pretrained(
    "CompVis/stable-diffusion-v1-4", subfolder="unet", use_safetensors=True
)

scheduler = UniPCMultistepScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")

# speed up using gpu
torch_device = "cuda"
vae.to(torch_device)
text_encoder.to(torch_device)
unet.to(torch_device)


prompt = ["a photograph of an astronaut riding a horse"]
height = 512  # default height of Stable Diffusion
width = 512  # default width of Stable Diffusion
num_inference_steps = 25  # Number of denoising steps
guidance_scale = 7.5  # Scale for classifier-free guidance
generator = torch.manual_seed(0)  # Seed generator to create the inital latent noise
batch_size = len(prompt)


# Tokenize the text and generate the embeddings from the prompt
text_input = tokenizer(
    prompt, padding="max_length", max_length=tokenizer.model_max_length, truncation=True, return_tensors="pt"
)

with torch.no_grad():
    text_embeddings = text_encoder(text_input.input_ids.to(torch_device))[0]
    
# Create random noise
latents = torch.randn(
    (batch_size, unet.in_channels, height // 8, width // 8),
    generator=generator,
)
latents = latents.to(torch_device)
# Decode the image
# scale and decode the image latents with vae
latents = 1 / 0.18215 * latents
with torch.no_grad():
    image = vae.decode(latents).sample
    
image = (image / 2 + 0.5).clamp(0, 1).squeeze()
image = (image.permute(1, 2, 0) * 255).to(torch.uint8).cpu().numpy()
images = (image * 255).round().astype("uint8")
image = Image.fromarray(image)
image
  • When you create the denoising loop later, you’ll iterate over this tensor to denoise an image
代码语言:javascript
复制
$ scheduler.timesteps
tensor([980, 960, 940, 920, 900, 880, 860, 840, 820, 800, 780, 760, 740, 720,
    700, 680, 660, 640, 620, 600, 580, 560, 540, 520, 500, 480, 460, 440,
    420, 400, 380, 360, 340, 320, 300, 280, 260, 240, 220, 200, 180, 160,
    140, 120, 100,  80,  60,  40,  20,   0])

Training examples

Task

🤗 Accelerate

🤗 Datasets

Unconditional Image Generation

Text-to-Image fine-tuning

Textual Inversion

-

Dreambooth

-

Training with LoRA

-

ControlNet

InstructPix2Pix

Custom Diffusion

T2I Adapters

Popular Tasks & Pipelines

Task

Pipeline

Unconditional Image Generation

DDPM

Text-to-Image

Stable Diffusion Text-to-Image

Text-to-Image

unclip

Text-to-Image

DeepFloyd IF

Text-to-Image

Kandinsky

Text-guided Image-to-Image

Controlnet

Text-guided Image-to-Image

Instruct Pix2Pix

Text-guided Image-to-Image

Stable Diffusion Image-to-Image

Text-guided Image Inpainting

Stable Diffusion Inpaint

Image Variation

Stable Diffusion Image Variation

Super Resolution

Stable Diffusion Upscale

Super Resolution

Stable Diffusion Latent Upscale

References

Documentation

What can I learn?

Tutorial

A basic crash course for learning how to use the library's most important features like using models and schedulers to build your own diffusion system, and training your own diffusion model.

Loading

Guides for how to load and configure all the components (pipelines, models, and schedulers) of the library, as well as how to use different schedulers.

Pipelines for inference

Guides for how to use pipelines for different inference tasks, batched generation, controlling generated outputs and randomness, and how to contribute a pipeline to the library.

Optimization

Guides for how to optimize your diffusion model to run faster and consume less memory.

Training

Guides for how to train a diffusion model for different tasks with different training techniques.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2023-09-20 14:30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 iResearch666 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • Introduction
  • Installation
  • Quickstart
    • DiffusionPipeline
      • Models
        • Schedulers
          • Inference code
            • Training code
              • Distributed inference
                • 1 Accelerate
                • 2 PyTorch Distributed
              • Understanding pipelines, models and schedulers
                • Training examples
                • Popular Tasks & Pipelines
                • References
                领券
                问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档