CivArchive
    Nancy Ace - Wan v1.1

    This lora produces the likeness of the model/porn star Nancy Ace.

    Wan - v1.0: Just realized that I only captioned a few of the images with the keyword. It still works really well, but retraining a v1.1 model with all images captioned.

    Training

    I trained this lora on a 4090 using diffusion-pipe. I followed this tutorial to set diffusion-pipe up on my machine. Trained with 20 images at 1024x1024 and 800x1024px images for 30 epochs, rank 32, 10 repeats, using fp8_e4m3fn quant (see config.toml, dataset.toml below). Character/likeness loras seem to be best trained with images, while activities seem to best be trained with short videos. Then in order to mix the character with the activity, use both a character lora and an activity lora.

    The images I used are a mixture of full body/clothed, full body/not clothed, and close ups of her face. I annotated manually, keeping the annotations simple and focusing on the the things I didn't want baked into the lora (pose, clothing, some surroundings). Training took around 4 hours.

    Example annotation: "nancya is squatting on stairs with her legs spread wide. She is wearing a blue long sleeve jumper and white tennis shoes."

    I'm really impressed with how Hunyuan reproduces both body likeness and facial likeness! But facial likeness suffers a bit for full body shots. Not sure if this is a common problem with loras. I've seen some people suggest post processing with React to fix faces... my goal is to avoid doing that so I'll keep experimenting to make this better.

    I'll also continue to experiment to determine how to generate a quality lora with the fewest images and (related) more quickly. My first pass at this included only non-clothed and close up face images. When using the resulting lora, I found that prompts that included clothing did not reproduce the character likeness. Continuing to experiment.

    Generation

    Videos can be generated using Kijai Hunyuan video nodes or built in comfy Hunyan nodes. I use Kijai because I feel it gives better results and more control. Download and drag videos to comfyui. Use Comfyui manager to install missing nodes.

    Prompting: Use nancya as the keyword. Do not describe any features of her person/body (eg: don't use 'blonde straight hair', 'skinny', etc) since her likeness is already baked into the lora and using these terms often results in a different likeness being generated.

    For example:

    "nancya standing in front of a pool. She is wearing a red t-shirt and cut off jeans shorts."

    Also, prompting for specific articles of clothing or body parts often helps frame the shot. If you want a full body shot, prompting for feet or shoes usually works. You can use 'full body shot' or 'close up shot', but I find that 'full body shot' doesn't always work.

    Diffusion-pipe Hunyuan configuration files

    config.toml

    # Project paths
    output_dir = '/mnt/d/Projects/hunyuan-training/nancya/output'
    dataset = '/mnt/d/Projects/hunyuan-training/nancya/dataset.toml'
    
    # Training settings
    epochs = 40
    micro_batch_size_per_gpu = 1
    pipeline_stages = 1
    gradient_accumulation_steps = 4
    gradient_clipping = 1.0
    warmup_steps = 100
    
    # eval settings
    eval_every_n_epochs = 5
    eval_before_first_step = true
    eval_micro_batch_size_per_gpu = 1
    eval_gradient_accumulation_steps = 1
    
    # misc settings
    save_every_n_epochs = 5
    checkpoint_every_n_epochs = 5
    
    #checkpoint_every_n_minutes = 30
    activation_checkpointing = true
    partition_method = 'parameters'
    save_dtype = 'bfloat16'
    caching_batch_size = 1
    steps_per_print = 1
    video_clip_mode = 'single_middle'
    
    [model]
    type = 'hunyuan-video'
    transformer_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
    #transformer_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_bf16.safetensors'
    vae_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_vae_bf16.safetensors'
    llm_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/llm/llava-llama-3-8b-text-encoder-tokenizer'
    clip_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/clip/clip-vit-large-patch14'
    dtype = 'bfloat16'
    transformer_dtype = 'float8'
    timestep_sample_method = 'logit_normal'
    
    [adapter]
    type = 'lora'
    rank = 32
    dtype = 'bfloat16'
    
    [optimizer]
    type = 'adamw_optimi'
    lr = 2e-5
    betas = [0.9, 0.99]
    weight_decay = 0.01
    eps = 1e-8

    dataset.toml

    # Resolution settings.
    # Can adjust this to 1024 for image training, especially on 24gb cards.
    resolutions = [[1024,1024],[800,1024]]
    
    #Aspect ratio bucketing settings
    enable_ar_bucket = true
    min_ar = 0.5
    max_ar = 2.0
    num_ar_buckets = 7
    
    # Frame buckets (1 is for images)
    frame_buckets = [1]
    
    [[directory]]
    # Set this to where your dataset is
    path = '/mnt/d/Projects/hunyuan-training/nancya/1024px/'
    # Reduce as necessary
    num_repeats = 10
    

    Diffusion-pipe Wan configuration files

    config.wan.toml

    # Dataset config file.
    output_dir = '/mnt/d/Projects/video-training/nancya/output'
    dataset = '/mnt/d/Projects/video-training/nancya/dataset.toml'
    
    # Training settings
    epochs = 100
    micro_batch_size_per_gpu = 1
    pipeline_stages = 1
    gradient_accumulation_steps = 4
    gradient_clipping = 1.0
    warmup_steps = 100
    
    # eval settings
    eval_every_n_epochs = 5
    eval_before_first_step = true
    eval_micro_batch_size_per_gpu = 1
    eval_gradient_accumulation_steps = 1
    
    # misc settings
    save_every_n_epochs = 10
    checkpoint_every_n_epochs = 10
    #checkpoint_every_n_minutes = 30
    activation_checkpointing = true
    partition_method = 'parameters'
    save_dtype = 'bfloat16'
    caching_batch_size = 1
    steps_per_print = 1
    video_clip_mode = 'single_middle'
    blocks_to_swap = 20
    
    [model]
    type = 'wan'
    # 1.3B
    #ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-1.3B'
    # 14B
    ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-14B'
    
    transformer_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2_1-T2V-14B_fp8_e5m2.safetensors' #kijai
    vae_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan_2_1_VAE_bf16.safetensors' #kijai
    llm_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/umt5-xxl-enc-bf16.safetensors'#kijai
    
    dtype = 'bfloat16'
    # You can use fp8 for the transformer when training LoRA.
    #transformer_dtype = 'float8'
    timestep_sample_method = 'logit_normal'
    
    [adapter]
    type = 'lora'
    rank = 32
    dtype = 'bfloat16'
    
    [optimizer]
    type = 'adamw_optimi'
    lr = 5e-5
    betas = [0.9, 0.99]
    weight_decay = 0.01
    eps = 1e-8
    

    dataset.toml

    # Resolution settings.
    # Can adjust this to 1024 for image training, especially on 24gb cards.
    resolutions = [1024]
    
    #Aspect ratio bucketing settings
    enable_ar_bucket = true
    min_ar = 0.5
    max_ar = 2.0
    num_ar_buckets = 7
    
    # Frame buckets (1 is for images)
    frame_buckets = [1]
    
    [[directory]]
    # Set this to where your dataset is
    path = '/mnt/d/Projects/video-training/nancya/1024px/'
    # Reduce as necessary
    num_repeats = 5
    

    Description

    Updated keyword captioning to make sure it's complete for each pic in dataset. Results seem much more consistent and less plastic.

    LORA
    Wan Video

    Details

    Downloads
    148
    Platform
    CivitAI
    Platform Status
    Deleted
    Created
    4/14/2025
    Updated
    7/7/2025
    Deleted
    5/23/2025
    Trigger Words:
    nancya