CivArchive
    Guide: 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental) - v1.0
    Preview 416650Preview 416659Preview 416667

    Updates will be mirrored on both Hugging Face and Civitai.

    Introduction

    It has been shown that multiple characters can be trained into the same model. A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image.

    Method

    Requirement: Sets of single-character images
    Steps:
    1. Train a multi-concept model using the original dataset
    2. Create an augmentation dataset of joined image pairs (composites) from the original dataset
    3. Train on the augmentation dataset
    

    Experiment

    Setup

    3 characters from the game Cinderella Girls are chosen for the experiment. The base model is anime-final-pruned. It has been checked that the base model has minimal knowledge of the trained characters. Images with multiple characters were removed (edit: after training it was found that some unrelated background characters were not removed).

    For the captions of the joined images, the template format CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight is used.

    A LoRA (Hadamard product) is trained using the config file below:

    [model_arguments]
    v2 = false
    v_parameterization = false
    pretrained_model_name_or_path = "Animefull-final-pruned.ckpt"
    
    [additional_network_arguments]
    no_metadata = false
    unet_lr = 0.0005
    text_encoder_lr = 0.0005
    network_module = "lycoris.kohya"
    network_dim = 8
    network_alpha = 1
    network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",]
    network_train_unet_only = false
    network_train_text_encoder_only = false
    
    [optimizer_arguments]
    optimizer_type = "AdamW8bit"
    learning_rate = 0.0005
    max_grad_norm = 1.0
    lr_scheduler = "cosine"
    lr_warmup_steps = 0
    
    [dataset_arguments]
    debug_dataset = false
    # keep token 1
    # resolution 640*640
    
    [training_arguments]
    output_name = "cg3comp"
    save_precision = "fp16"
    save_every_n_epochs = 1
    train_batch_size = 4
    max_token_length = 225
    mem_eff_attn = false
    xformers = true
    max_train_epochs = 40
    max_data_loader_n_workers = 8
    persistent_data_loader_workers = true
    gradient_checkpointing = false
    gradient_accumulation_steps = 1
    mixed_precision = "fp16"
    clip_skip = 2
    lowram = true
    
    [sample_prompt_arguments]
    sample_every_n_epochs = 1
    sample_sampler = "k_euler_a"
    
    [saving_arguments]
    save_model_as = "safetensors"
    

    For the second stage of training, the batch size was reduced to 2 and the resolution was set to 768 * 768 while keeping other settings identical. The training took less than 2 hours on a T4 GPU.

    Results

    (see preview images)

    Limitations

    • This technique doubles the memory/compute requirement

    • Composites can still be generated despite negative prompting

    • Cloned characters seem to become the primary failure mode in place of blended characters

    • Can generate 2 characters but not 3 for some reason

    Related Works

    Models been trained on datasets based on anime shows have demonstrated multi-subject capability. Simply using concepts distant enough such as 1girl, 1boy has also been shown to be effective.

    Future work

    Below is a list of ideas yet to be explored

    • Synthetic datasets

    • Unequal aspect ratios

    • Regularization

    • Joint training instaed of sequential

    Description

    LORA
    SD 1.5

    Details

    Downloads
    644
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/4/2023
    Updated
    9/27/2025
    Deleted
    -

    Files

    cg3comp.safetensors

    Mirrors

    CivitAI (1 mirrors)