Photo Background - 2d Compositing|写真背景・二次元合成
Trained on 2d illustrations composited on a photo background.
This is a small LoRA I thought would be interesting to see how models trained on illustrations or real world images/video can produce the composite, mixed reality effect.
ℹ️ LoRA work best when applied to the base models on which they are trained. Please read the About This Version on the appropriate base models and workflow/training information.
Metadata is included in all uploaded files, you can drag the generated videos into ComfyUI to use the embedded workflows.
Recommended prompt structure:
Positive prompt (trigger at the end of prompt, before quality tags for non-hunyaun versions):
{{tags}}
real world location, photo background,
masterpiece, best quality, very awa, absurdresNegative prompt:
(worst quality, low quality, sketch:1.1), error, bad anatomy, bad hands, watermark, ugly, distorted, censored, lowresDescription
Trained on Anima Preview
Assume that any lora trained on the preview version won't work well on the final version.
Updated dataset, with a mix of NL captions and tags.
Used diffusion-pipe - fork by @bluvoll
Config:
# dataset-anima.toml
# Resolution settings.
resolutions = [1024]
# Aspect ratio bucketing settings
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 7
# config-anima.toml
[[directory]] # IMAGES
# Path to the directory containing images and their corresponding caption files.
path = '/mnt/d/huanvideo/training_data/images'
num_repeats = 1
resolutions = [1024]
# Change these paths
output_dir = '/mnt/d/anima/training_output'
dataset = 'dataset-anima.toml'
# training settings
epochs = 60
micro_batch_size_per_gpu = 6
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1.0
warmup_steps = 100
# eval settings
eval_every_n_epochs = 1
eval_before_first_step = true
eval_micro_batch_size_per_gpu = 1
eval_gradient_accumulation_steps = 1
# misc settings
save_every_n_epochs = 1
checkpoint_every_n_minutes = 120
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
steps_per_print = 1
[model]
type = 'anima'
transformer_path = '/mnt/c/models/diffusion_models/anima-preview.safetensors'
vae_path = '/mnt/c/models/vae/qwen_image_vae.safetensors'
qwen_path = '../qwen0.6/Qwen3-0.6B/'
dtype = 'bfloat16'
timestep_sample_method = 'logit_normal'
sigmoid_scale = 1.0
shift = 3.0
# Caption Processing Options
cache_text_embeddings = false
# NOTE: Requires cache_text_embeddings = false to work!
# For cached embeddings, use cache_shuffle_num in your dataset config instead.
shuffle_tags = true
tag_delimiter = ', '
keep_first_n_tags = 5
shuffle_keep_first_n = 5
tag_dropout_percent = 0.10
protected_tags_file = './protected_tags.txt'
nl_shuffle_sentences = true
nl_keep_first_sentence = false
# 'tags' 'nl' 'mixed'
caption_mode = 'mixed'
debug_caption_processing = true
debug_caption_interval = 100
[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'
# AdamW from the optimi library is a good default since it automatically uses Kahan summation when training bfloat16 weights.
[optimizer]
type = 'adamw_optimi'
lr = 5e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8