Nancy Ace - CivArchive (CivitAI Archive)

This lora produces the likeness of the model/porn star Nancy Ace.

Wan - v1.0: Just realized that I only captioned a few of the images with the keyword. It still works really well, but retraining a v1.1 model with all images captioned.

Training

I trained this lora on a 4090 using diffusion-pipe. I followed this tutorial to set diffusion-pipe up on my machine. Trained with 20 images at 1024x1024 and 800x1024px images for 30 epochs, rank 32, 10 repeats, using fp8_e4m3fn quant (see config.toml, dataset.toml below). Character/likeness loras seem to be best trained with images, while activities seem to best be trained with short videos. Then in order to mix the character with the activity, use both a character lora and an activity lora.

The images I used are a mixture of full body/clothed, full body/not clothed, and close ups of her face. I annotated manually, keeping the annotations simple and focusing on the the things I didn't want baked into the lora (pose, clothing, some surroundings). Training took around 4 hours.

Example annotation: "nancya is squatting on stairs with her legs spread wide. She is wearing a blue long sleeve jumper and white tennis shoes."

I'm really impressed with how Hunyuan reproduces both body likeness and facial likeness! But facial likeness suffers a bit for full body shots. Not sure if this is a common problem with loras. I've seen some people suggest post processing with React to fix faces... my goal is to avoid doing that so I'll keep experimenting to make this better.

I'll also continue to experiment to determine how to generate a quality lora with the fewest images and (related) more quickly. My first pass at this included only non-clothed and close up face images. When using the resulting lora, I found that prompts that included clothing did not reproduce the character likeness. Continuing to experiment.

Generation

Videos can be generated using Kijai Hunyuan video nodes or built in comfy Hunyan nodes. I use Kijai because I feel it gives better results and more control. Download and drag videos to comfyui. Use Comfyui manager to install missing nodes.

Prompting: Use nancya as the keyword. Do not describe any features of her person/body (eg: don't use 'blonde straight hair', 'skinny', etc) since her likeness is already baked into the lora and using these terms often results in a different likeness being generated.

For example:

"nancya standing in front of a pool. She is wearing a red t-shirt and cut off jeans shorts."

Also, prompting for specific articles of clothing or body parts often helps frame the shot. If you want a full body shot, prompting for feet or shoes usually works. You can use 'full body shot' or 'close up shot', but I find that 'full body shot' doesn't always work.

Diffusion-pipe Hunyuan configuration files

config.toml

# Project paths
output_dir = '/mnt/d/Projects/hunyuan-training/nancya/output'
dataset = '/mnt/d/Projects/hunyuan-training/nancya/dataset.toml'

# Training settings
epochs = 40
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 5
checkpoint_every_n_epochs = 5

#checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'

[model]
type = 'hunyuan-video'
transformer_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_fp8_e4m3fn.safetensors'
#transformer_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_720_cfgdistill_bf16.safetensors'
vae_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/hunyuan/hunyuan_video_vae_bf16.safetensors'
llm_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/llm/llava-llama-3-8b-text-encoder-tokenizer'
clip_path = '/mnt/d/Projects/hunyuan-training/diffusion-pipe/models/clip/clip-vit-large-patch14'
dtype = 'bfloat16'
transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 2e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8

dataset.toml

# Resolution settings.
# Can adjust this to 1024 for image training, especially on 24gb cards.
resolutions = [[1024,1024],[800,1024]]

#Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]]
# Set this to where your dataset is
path = '/mnt/d/Projects/hunyuan-training/nancya/1024px/'
# Reduce as necessary
num_repeats = 10

Diffusion-pipe Wan configuration files

config.wan.toml

# Dataset config file.
output_dir = '/mnt/d/Projects/video-training/nancya/output'
dataset = '/mnt/d/Projects/video-training/nancya/dataset.toml'

# Training settings
epochs = 100
micro_batch_size_per_gpu = 1
pipeline_stages = 1
gradient_accumulation_steps = 4
gradient_clipping = 1.0
warmup_steps = 100

# eval settings
eval_every_n_epochs = 5
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1

# misc settings
save_every_n_epochs = 10
checkpoint_every_n_epochs = 10
#checkpoint_every_n_minutes = 30
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
video_clip_mode = 'single_middle'
blocks_to_swap = 20

[model]
type = 'wan'
# 1.3B
#ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-1.3B'
# 14B
ckpt_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2.1-T2V-14B'

transformer_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan2_1-T2V-14B_fp8_e5m2.safetensors' #kijai
vae_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/Wan_2_1_VAE_bf16.safetensors' #kijai
llm_path = '/mnt/d/software_tools/diffusion-pipe/models/wan/umt5-xxl-enc-bf16.safetensors'#kijai

dtype = 'bfloat16'
# You can use fp8 for the transformer when training LoRA.
#transformer_dtype = 'float8'
timestep_sample_method = 'logit_normal'

[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'

[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8

dataset.toml

# Resolution settings.
# Can adjust this to 1024 for image training, especially on 24gb cards.
resolutions = [1024]

#Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7

# Frame buckets (1 is for images)
frame_buckets = [1]

[[directory]]
# Set this to where your dataset is
path = '/mnt/d/Projects/video-training/nancya/1024px/'
# Reduce as necessary
num_repeats = 5

Training

Generation

Diffusion-pipe Hunyuan configuration files

Diffusion-pipe Wan configuration files

Description

Details

Files

nancya_wan21_v1.1_t2v_e70.safetensors

Mirrors

Available On (1 platform)