Ditto, a free and open-source AI that can edit videos using text prompts.Developed by Ant Group, an affiliate of Alibaba, Ditto is a holistic framework designed for instruction-based video editing. It utilizes a novel data generation pipeline to create a large-scale, high-quality dataset of video editing examples.
Tested on 10GB GPU + 64GB RAM + 'Sage Attention' , 3 120GHZ monitors, Firefox (21 Tabs open 4 actively used) , background music , Ubuntu 24.04.3 LTS: works like a charm
Base model: https://civarchive.com/models/1651125?modelVersionId=1868891 or any wan2.1 T2V (Text to video)
Wan21_CausVid_14B_T2V_lora_rank32_v2.safetensors to
loras/for inference accelerationWan2_1_VAE_bf16.safetensors to
vae/wan/umt5-xxl-enc-bf16.safetensors to
text_encoders/fp8 is not supported yet
Workflow in training data file or search civitai for different variations
More info HuggingFace | GitHub
These models are redistributed here for the sake of convenience.
Description
FAQ
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.