(((NOT MY MODEL))) Stable Video Diffusion (SVD) Image-to-Video is a diffusion model that takes in a still image as a conditioning frame, and generates a video from it. (SVD) Image-to-Video is a latent diffusion model trained to generate short video clips from an image conditioning. This model was trained to generate 25 frames at resolution 576x1024 given a context frame of the same size, finetuned from SVD Image-to-Video [14 frames]. We also finetune the widely used f8-decoder for temporal consistency. For convenience,
real repo stabilityai/stable-video-diffusion-img2vid-xt at main (huggingface.co)
a latent video diffusion model for high-resolution, state-of-the-art text-tovideo and image-to-video synthesis. To construct its pretraining dataset, we conduct a systematic data selection and scaling study, and propose a method to curate vast amounts of video data and turn large and noisy video collection into suitable datasets for generative video models. Furthermore, we introduce three distinct stages of video model training which we separately analyze to assess their impact on the final model performance. Stable Video Diffusion provides a powerful video representation from which we finetune video models for state-of-the-art image-to-video synthesis and other highly relevant applications such as LoRAs for camera control. Finally we provide a pioneering study on multi-view finetuning of video diffusion models and show that SVD constitutes a strong 3D prior, which obtains stateof-the-art results in multi-view synthesis while using only a 8 fraction of the compute of previous methods.
Description
its just a imgvid version by stablity ai. 12k steps (∼16 hours) with 8 80GB A100 GPUs using a total batch size of 16, with a learning rate of 1e-5
FAQ
Comments (3)
you really censored that one punch that everyone would love to see -.-
Why not add the sources to the origins of the model? It looks like you're claiming you did this yourself. https://stability.ai/news/stable-video-diffusion-open-ai-video-model, https://github.com/Stability-AI/generative-models, https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt
Genuinely curious as to what the major (or minor) differences are between this model and the one direct from stabilityAI (Uploaded here just a few days before this one and is 500mb less)
Details
Files
stableVideoDiffusion_v10.safetensors
Mirrors
stableVideoDiffusion_v10.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
stableVideoDiffusion_img2vid.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
svd.safetensors
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.