"Open

# Flax Stable Diffusion Videos

This notebook allows you to generate videos by interpolating the latent space of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) using TPU for faster inference.

In comparison with standard Colab GPU, this runs ~6x faster after the first run. The first run is comparable to the GPU version because it compiles the code.

You can either dream up different versions of the same prompt, or morph between different text prompts (with seeds set for each for reproducibility).

If you like this notebook:
- consider giving the [repo a star](https://github.com/nateraw/stable-diffusion-videos) ⭐️
- consider following us on Github [@nateraw](https://github.com/nateraw) [@charlielito](https://github.com/charlielito)

You can file any issues/feature requests [here](https://github.com/nateraw/stable-diffusion-videos/issues)

Enjoy πŸ€—

## Setup

In [None]:
#@title Set up JAX
#@markdown If you see an error, make sure you are using a TPU backend. Select `Runtime` in the menu above, then select the option "Change runtime type" and then select `TPU` under the `Hardware accelerator` setting.
!pip install --upgrade jax jaxlib 

import jax.tools.colab_tpu
jax.tools.colab_tpu.setup_tpu('tpu_driver_20221011')

!pip install flax diffusers transformers ftfy
jax.devices()

In [None]:
%%capture
! pip install stable_diffusion_videos

## Run the App πŸš€

### Load the Interface

This step will take a couple minutes the first time you run it.

In [None]:
import numpy as np
import jax
import jax.numpy as jnp

from jax import pmap
from flax.jax_utils import replicate
from flax.training.common_utils import shard
from PIL import Image

from stable_diffusion_videos import FlaxStableDiffusionWalkPipeline, Interface

pipeline, params = FlaxStableDiffusionWalkPipeline.from_pretrained(
 "CompVis/stable-diffusion-v1-4", 
 revision="bf16", 
 dtype=jnp.bfloat16
)
p_params = replicate(params)

interface = Interface(pipeline, params=p_params)

In [None]:
#@title Connect to Google Drive to Save Outputs

#@markdown If you want to connect Google Drive, click the checkbox below and run this cell. You'll be prompted to authenticate.

#@markdown If you just want to save your outputs in this Colab session, don't worry about this cell

connect_google_drive = True #@param {type:"boolean"}

#@markdown Then, in the interface, use this path as the `output` in the Video tab to save your videos to Google Drive:

#@markdown > /content/gdrive/MyDrive/stable_diffusion_videos


if connect_google_drive:
 from google.colab import drive

 drive.mount('/content/gdrive')

### Launch

This cell launches a Gradio Interface. Here's how I suggest you use it:

1. Use the "Images" tab to generate images you like.
 - Find two images you want to morph between
 - These images should use the same settings (guidance scale, height, width)
 - Keep track of the seeds/settings you used so you can reproduce them

2. Generate videos using the "Videos" tab
 - Using the images you found from the step above, provide the prompts/seeds you recorded
 - Set the `num_interpolation_steps` - for testing you can use a small number like 3 or 5, but to get great results you'll want to use something larger (60-200 steps). 

πŸ’‘ **Pro tip** - Click the link that looks like `https://.gradio.app` below , and you'll be able to view it in full screen.

In [None]:
interface.launch(debug=True)

---

## Use `walk` programmatically

The other option is to not use the interface, and instead use `walk` programmatically. Here's how you would do that...

First we define a helper fn for visualizing videos in colab

In [None]:
from IPython.display import HTML
from base64 import b64encode

def visualize_video_colab(video_path):
 mp4 = open(video_path,'rb').read()
 data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
 return HTML("""
 
 """ % data_url)

Walk! πŸšΆβ€β™€οΈ

In [None]:
video_path = pipeline.walk(
 p_params,
 ['a cat', 'a dog'],
 [42, 1337],
 fps=5, # use 5 for testing, 25 or 30 for better quality
 num_interpolation_steps=30, # use 3-5 for testing, 30 or more for better results
 height=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
 width=512, # use multiples of 64 if > 512. Multiples of 8 if < 512.
 jit=True # To use all TPU cores
)
visualize_video_colab(video_path)

### Bonus! Music videos

First, we'll need to install `youtube-dl`

In [None]:
%%capture
! pip install youtube-dl

Then, we can download an example music file. Here we download one from my soundcloud:

In [None]:
! youtube-dl -f bestaudio --extract-audio --audio-format mp3 --audio-quality 0 -o "music/thoughts.%(ext)s" https://soundcloud.com/nateraw/thoughts

In [None]:
from IPython.display import Audio

Audio(filename='music/thoughts.mp3')

In [None]:
# Seconds in the song
audio_offsets = [7, 9]
fps = 8

# Convert seconds to frames
num_interpolation_steps = [(b-a) * fps for a, b in zip(audio_offsets, audio_offsets[1:])]

video_path = pipeline.walk(
 p_params,
 prompts=['blueberry spaghetti', 'strawberry spaghetti'],
 seeds=[42, 1337],
 num_interpolation_steps=num_interpolation_steps,
 height=512, # use multiples of 64
 width=512, # use multiples of 64
 audio_filepath='music/thoughts.mp3', # Use your own file
 audio_start_sec=audio_offsets[0], # Start second of the provided audio
 fps=fps, # important to set yourself based on the num_interpolation_steps you defined
 batch_size=2, # in TPU-v2 typically maximum of 3 for 512x512
 output_dir='./dreams', # Where images will be saved
 name=None, # Subdir of output dir. will be timestamp by default
 jit=True # To use all TPU cores
)
visualize_video_colab(video_path)