CivArchive
    Zeroscope V2 XL (txt2video) - v1.0

    Stop! These models are not for txt2img inference!

    Don't put them in your stable-diffusion-webui/models directory and expect to make images!

    So what are these?

    These are new Modelscope based models for txt2video, optimized to produce 16:9 video compositions. They've been trained on 9,923 video clips and 29,769 tagged frames at 24 fps, 1024x576 res.

    Note that these are the bigger brothers to the https://civarchive.com/models/96454/zeroscope-v2-576w-txt2video models. The XL models use 15.3GB of VRAM when rendering 30 fps at 1024x576.

    Where do they go?

    Drop them in the \stable-diffusion-webui\models\ModelScope\t2v folder

    It's imperative you rename the text2video_pytorch_model.pt to .pth extension after downloading.

    The files must be named open_clip_pytorch_model.bin, and text2video_pytorch_model.pth

    Who made them? Original Source?

    https://huggingface.co/cerspense/zeroscope_v2_XL

    What else do I need?

    These models are specifically for use with the txt2video Auto1111 WebUI Extension

    Description

    Other
    SD 1.5

    Details

    Downloads
    926
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/24/2023
    Updated
    9/27/2025
    Deleted
    -

    Files

    zeroscopeV2XL_v10.pt

    Mirrors

    Huggingface (1 mirrors)
    CivitAI (1 mirrors)

    zeroscopeV2XL_v10.bin

    Mirrors

    CivitAI (1 mirrors)