CivArchive
    Guide: 1to2: Training Multiple-Subject Models using only Single-Subject Data (Experimental) - v1.0
    Preview 416650
    Preview 416659
    Preview 416667

    Updates will be mirrored on both Hugging Face and Civitai.

    Introduction

    It has been shown that multiple characters can be trained into the same model. A harder task is to create a model that can generate multiple characters simultaneously without modifying the generation pipeline. This document describes a simple technique that has been shown to help generating multiple characters in the same image.

    Method

    Requirement: Sets of single-character images
    Steps:
    1. Train a multi-concept model using the original dataset
    2. Create an augmentation dataset of joined image pairs (composites) from the original dataset
    3. Train on the augmentation dataset
    

    Experiment

    Setup

    3 characters from the game Cinderella Girls are chosen for the experiment. The base model is anime-final-pruned. It has been checked that the base model has minimal knowledge of the trained characters. Images with multiple characters were removed (edit: after training it was found that some unrelated background characters were not removed).

    For the captions of the joined images, the template format CharLeft/CharRight/COMPOSITE, TagsLeft, TagsRight is used.

    A LoRA (Hadamard product) is trained using the config file below:

    [model_arguments]
    v2 = false
    v_parameterization = false
    pretrained_model_name_or_path = "Animefull-final-pruned.ckpt"
    
    [additional_network_arguments]
    no_metadata = false
    unet_lr = 0.0005
    text_encoder_lr = 0.0005
    network_module = "lycoris.kohya"
    network_dim = 8
    network_alpha = 1
    network_args = [ "conv_dim=0", "conv_alpha=16", "algo=loha",]
    network_train_unet_only = false
    network_train_text_encoder_only = false
    
    [optimizer_arguments]
    optimizer_type = "AdamW8bit"
    learning_rate = 0.0005
    max_grad_norm = 1.0
    lr_scheduler = "cosine"
    lr_warmup_steps = 0
    
    [dataset_arguments]
    debug_dataset = false
    # keep token 1
    # resolution 640*640
    
    [training_arguments]
    output_name = "cg3comp"
    save_precision = "fp16"
    save_every_n_epochs = 1
    train_batch_size = 4
    max_token_length = 225
    mem_eff_attn = false
    xformers = true
    max_train_epochs = 40
    max_data_loader_n_workers = 8
    persistent_data_loader_workers = true
    gradient_checkpointing = false
    gradient_accumulation_steps = 1
    mixed_precision = "fp16"
    clip_skip = 2
    lowram = true
    
    [sample_prompt_arguments]
    sample_every_n_epochs = 1
    sample_sampler = "k_euler_a"
    
    [saving_arguments]
    save_model_as = "safetensors"
    

    For the second stage of training, the batch size was reduced to 2 and the resolution was set to 768 * 768 while keeping other settings identical. The training took less than 2 hours on a T4 GPU.

    Results

    (see preview images)

    Limitations

    • This technique doubles the memory/compute requirement

    • Composites can still be generated despite negative prompting

    • Cloned characters seem to become the primary failure mode in place of blended characters

    • Can generate 2 characters but not 3 for some reason

    Related Works

    Models been trained on datasets based on anime shows have demonstrated multi-subject capability. Simply using concepts distant enough such as 1girl, 1boy has also been shown to be effective.

    Future work

    Below is a list of ideas yet to be explored

    • Synthetic datasets

    • Unequal aspect ratios

    • Regularization

    • Joint training instaed of sequential

    Description

    FAQ

    Comments (8)

    alea31415Apr 5, 2023· 1 reaction
    CivitAI

    For those who are interested, this is a paper that has explored this training technique

    https://arxiv.org/abs/2303.11305

    P.S. Sorry for repeated posting, but I did not notice that you already put this on civitai.

    AlykisApr 5, 2023
    CivitAI

    I dont know whether it's better than " latent couple",but create a model for yuri cp only sounds good,,haha

    alea31415Apr 5, 2023

    You can use the two together. It is much easier to use a model that already knows how to put multiple characters together with latent couple.

    gustproof
    Author
    Apr 6, 2023· 1 reaction

    I'm aware of similar methods such as regional prompter or multidiffusion but IMO these feel like hacks and can't do perspective or interactions well. The goal here is to use concepts in a way that can't be clearly region-separated (hence the hugging example). Also not modifying the pipeline means less effort and more diversity.

    turnipchemical8583Apr 26, 2023
    CivitAI

    So is this what "girlpack" creators are using? Those make multiple characters in a single lora/lycoris? Or is this different?

    gustproof
    Author
    Apr 27, 2023

    AFAICT girlpacks (SysDeep's models) just have single-subject images in the dataset without any extra processing. Those are easy to make but can't do multi-subjects. Try putting 2 trained characters in the same image and you'll see the point of this post.

    rasta_wobbleOct 5, 2023
    CivitAI

    Been trying to understand this guide for a couple weeks now.. what do you mean by joined image pairs? are these images generated from the LORA made in the initial training, or do they consist of both characters photoshoped together?

    gustproof
    Author
    Oct 6, 2023

    That's making combined images by sticking pairs of images together side by side. I used ImageMagick for that, but any appropriate tool would do. I used image from the original dataset (sampled some random pairs). I suppose you can use generated ones as well.

    LORA
    SD 1.5

    Details

    Downloads
    718
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/4/2023
    Updated
    5/13/2026
    Deleted
    -

    Files

    cg3comp.safetensors

    Mirrors

    CivitAI (1 mirrors)

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.