CivArchive
    realistic trans/futa [hunyuan video] - v1.0
    NSFW

    V2!


    V1 has some issues with blocky images and i removed low res images from training set which made end result slightly blurry. And also i trained with lower learning rate of 2e-4 for 100 epochs.

    NOTE: there was a bug in the training script. So for now use this code with the specific commit. but i will update with v3 soon.

    Use "Video of a transgender woman" at the beginning of the prompt to trigger it.

    git clone https://github.com/kohya-ss/musubi-tuner.git
    cd musubi-tuner
    git checkout fd70762
    pip install -r /local_disk0/musubi-tuner/requirements.txt
    python ./musubi-tuner/hv_generate_video.py --fp8 --video_size 1280 720 --video_length 120 --infer_steps 30 --prompt "Video of a transgender woman with fair skin and long, straight white hair, styled with white cat ears. She is dressed in a revealing, white lingerie set, featuring a frilly, off-shoulder crop top that exposes her midriff and a matching ruffled mini skirt. She is also wearing white fishnet stockings that reach just below her knees. Her makeup is bold, with dark eyeliner, mascara, and pink lipstick, complementing her cat-themed costume. She has several tattoos visible on her arms, including a script tattoo on her left arm and a circular tattoo on her right forearm. Her miniskirt is lifted to reveal her erect penis. The background is dimly lit with a purple hue. The setting appears to be indoors, likely a bedroom or a private space, with some indistinct furniture and decor visible. The overall atmosphere of the image is playful and provocative, enhanced by the cat ears and lingerie. The woman's pose is confident and slightly provocative, with one leg raised, adding to the overall seductive tone." --save_path "./videos/" --output_type video --dit ./hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt --attn_mode sdpa --vae ./hunyuan-video-t2v-720p/vae/pytorch_model.pt --vae_chunk_size 32 --vae_spatial_tile_sample_min_size 128 --text_encoder1 ./split_files/text_encoders/llava_llama3_fp16.safetensors --text_encoder2 ./split_files/text_encoders/clip_l.safetensors --seed 69 --lora_multiplier 0.8 --lora_weight ./lora.safetensors

    See this for more info https://github.com/kohya-ss/musubi-tuner?tab=readme-ov-file#inference, the repo also has a converter to convert to diffusion pipe/comfyui format

    Description

    FAQ

    Comments (5)

    makiaeveliJan 8, 2025· 5 reactions
    CivitAI

    Well for one, you don't necessarily need 5 seconds -- that is a longer video. 2 seconds + ping pong can give you a half-natural 4 second video. That'll immediately cut your processing to 15 minutes from 30. You can also lower your VAE Decode overlap. The default is 64, you could use 32. I also have 128 tile size instead of 256.

    You could also lower steps. If you use FastLora or FastModel, you can sometimes get results at 10 steps that are as good as 20

    RedHibiscus
    Author
    Jan 8, 2025· 1 reaction

    Interesting. Will try it out.

    makiaeveliJan 8, 2025

    Generally I'll make quicker 5 step videos of low length, only a second or so, to find a seed I'd like. Then you can crank it to make it a longer natural video. It seems like a Flux model underneath, so Id assume Flux prompts work well enough

    bhoppingJan 8, 2025

    Yeah I noticed when messing with length of video, it drastically improved the generation times. I usually play with 73-109 and my vae decode tile is set to the same thing as makia. When the vae decode is set too high, it'll just be stuck decoding for a long time

    makiaeveliJan 11, 2025

    Found a pretty crazy combo:

    208x368 pixels @ 133 frames with 16 steps and 18 fps takes 3 mins

    so 7 second video without pingpong

    LORA
    Hunyuan Video

    Details

    Downloads
    1,603
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/7/2025
    Updated
    5/13/2026
    Deleted
    -
    Trigger Words:
    transgender woman