[Experimental] 8GB VRAM Tik Tok Dance Workflow (AnimateLCM, Depth Controlnet, LoRA)

Introduction

This is a highly experimental workflow for generating dance videos that fit in 8GB VRAM. It requires you to tinker with the relative strength of the LoRA and the controlnet. It also requires a LoRA trained on only 1 attire and the attire has to roughly match the driving video to work well.

This was inspired by work by Reddit user specific_virus8061, who made a music video on a 8GB VRAM GPU. I noticed morphing of the video and this is a common limitation of AnimateDiff with a 16 length context window. I tried various methods to get around this and this workflow is the outcome.

Link to Reddit Post: https://www.reddit.com/r/StableDiffusion/comments/1fsz1dp/potato_vram_ai_music_video/

Who is it for?

People who have 8GB VRAM and do not mind tinkering with a workflow to get the most out of their hardware.

Who is it not for?

People who are looking for a one-click workflow.
People who have the VRAM to run a proper solution like MimicMotion

Workflow

The 1st part of the workflow uses a fixed latent batch seed behaviour with a depth controlnet and and character LoRA to generate images. You use the Image Generation Group to generate the individual frames, which will be saved as latents in the output/dance folder.

The 2nd part of the workflow puts the images through a AnimateLCM pass to create a video. Copy these latents to the input folder and refresh ComfyUI. Disable the Image Generation Group and activate the Video Generation Group. You can now set the latents in the LoadLatent nodes. You can add more LoadLatent and LatentBatch nodes as needed for the length of your video.

LoRA

Please use LoRAs that were trained only on 1 specific attire. You can try the LoRAs by cyberAngel, each LoRA has typically be trained on 1 attire.

https://civarchive.com/user/cyberAngel_/models?baseModels=SD+1.5

VRAM

VRAM usage is controlled by the Meta Batch node and the 2 Batch VAE decode nodes. Settings below have been tested to work well. Please leave a comment if these settings do not work for you.

8GB VRAM: Meta Batch: 12, VAE Decode: 2

12GB VRAM: Meta Batch: 24, VAE Decode: 16

Evaluation of the Results

This is by no means a perfect workflow, the hands, collar, tie, buttons and background all have issues to be fixed. I am releasing this for the community with lower VRAM to have fun and see how far they can take the concept.

Models Needed

SD1.5 LCM: https://civarchive.com/models/81458?modelVersionId=256668
AnimateLCM_sd15_t2v.ckpt (https://huggingface.co/wangfuyun/AnimateLCM)
Install Using Manager:
- depth_anything_v2_vitl.pth
- control_v11f1p_sd15_depth_fp16.safetensors

Custom Nodes Needed

Install missing custom nodes using manager.

ComfyUI's ControlNet Auxiliary Preprocessors
ComfyUI Frame Interpolation
ComfyUI-Advanced-ControlNet
AnimateDiff Evolved
ComfyUI-VideoHelperSuite
rgthree's ComfyUI Nodes
KJNodes for ComfyUI
Crystools

Description

Details

Files

Experimental8GBVRAMTikTok_v10.zip

Mirrors