CivArchive
    Retro 90's Anime / Golden Boy Style Lora Wan 2.2 14B - Wan 2.1 14B v2.0
    NSFW

    Wan 2.2 14B V1 whats new:

    -This is the same dataset + captions from the 2.1 lora, except trained on both the high and low WAN 2.2 14B models.

    -You will get all the benefits of the upgrade from 2.1 to 2.2. Especially movement and camera control is quite nice.

    -Warning: I feel its gotten further away from the dataset in terms of style, I will try doing more training later, I think its in good enough state now to release. I wanna move onto something new next, this update was more about learning how to train 2.2. Read below about the training for my process which is new.

    What is this lora?

    This is a style lora used to recreate the style of the 1995 anime series "Golden Boy". The series has beautiful mid-90s matte painting style backgrounds which came out great in the lora. And the way they draw the girls is awesome and really represents the art style of the period for raunchy comedies. But if you just want a mid-90's retro style anime look in wan then use this lora too, its really great at just doing older style anime in general. And it is perfect at doing detailed environmental shots. Its captioned on bikes, cars, delicious looking food, garbage etc, not just people. Its trained on the T2V model and therefore should also work for I2V.

    Trigger word: Goldenboystyle

    (You do not need to add any other descriptions for anime or animation style in the prompt, it should make it the style without any other prompting). In fact I would recommend against adding anime keywords to the prompt as it will create more of a bias from the base model which is now trained on anime much better than before. The trigger word may not even be needed but I put it in anyway.

    All the characters from the show is in the training data. The blonde woman (Madame President) comes out pretty much if you mention blonde women. If you describe any character from the show it will probably generate them accurately. The main character Kentaro Oe will also come out if prompted but only by description and not by name. The silly faces the characters make are also in the training data. There is ????? breasts in the training data but no lower genitals.

    Recommended Settings

    It can run on default wan workflow just fine, and retains that real nostalgic retro animation style, but I recommend to mix this lora with the following optimization loras. There is 3 settings I recommend and each has their own positives and negative effects.

    It's too early to tell whats best aside from default. I am leaning on the 2.1 light lora because 2.2 kills the motion so hard and also changes the style to be less like the show, but still a nice retro feeling.

    I think I will make a workflow for my loras and link that from now on. So just download my example workflow for this lora and try it yourself.

    See below image to see how it effects the look. For motion check the example generations I provided for these, it has in the comments the settings I used for reference.

    LINK TO EXAMPLE WORKFLOWS

    https://civitai.com/models/1868??1

    1.) Default Setting
    Just run the lora with no other loras and it will work fine. And it will retain the closest look and feel to the original source material. On a 3090 it takes over 20 mins for a 720p video to generate.

    20 steps (10/10), 3.5 CFG, NO NAG

    Benefits: Closer to trained data. You get all the 2.2 benefits like motion, quality, camera control etc.

    Negatives: Slower, more resource intense

    2.) Lightx2V Wan 2.1 Lora Optimization

    1.) This lora (golden boy style) (strength 1.0 on both high and low)

    2.) Wan21_T2V_14B_lightx2V_cfg_step_destill_lora_rank32 (strength 1.0 on both, use the same lora file on both high/low)

    7 Steps ( 3 / 4), though you can try 4/4 or 2/2. CFG 1 with NAG

    Benefits: Can complete higher resolution with fewer steps. The motion is retained and style closer default than lightning lora.

    Negatives: Lightx2V is a Wan 2.1 lora so I think you downgrade the output to look more like 2.1 than 2.2. I feel also that the colors are a bit dark. It adds some weird snow effect sometimes which can be mitigated by increasing strength on the lightx2v loras.

    3.) Lightning 1.1 Wan 2.2 Lora Optimization

    7 Steps ( 3 / 4), though you can try 4/4 or 2/2.CFG 1 with NAG

    1.) This lora (golden boy style) (strength 1.0 on both high and low)

    2.) Wan 2.2 Lighting v1.1 loras (strength 1.0 on both high and low)

    Benefits: Can complete higher resolution with fewer steps. It kinda makes colors brighter and less saturated if you like that aesthetic. Its a 2.2 lora so you technically get benefits from 2.2 wan but its kind of not working properly.

    Negatives: It effects the style heavily, it still looks anime retro but the colors are brighter than the source material. The motion is HEAVILY reduced.

    4.) Other 2.1 loras

    These above two loras were great for 2.1 version of this. I don't use them because I feel the more 2.1 lora you use the less the output looks like 2.2 and just becomes wan 2.1 again... If they release 2.2 versions of these loras I will edit here.

    See the below example to see how each setting effects the output compared to the source.

    In the end, I don't think there is a right choice as they all have negatives in some way, its too early to tell the best way to set things. So I'll update this bit in the future if I figure anything out. The motion killed in #3 makes me use #2 most often. And I don't have much patience for option #1. Please tell me if you have a good setting suggestion.

    Training Info

    Low Lora Model:

    [model]

    type = 'wan'

    ckpt_path = '/data/trainingstuff/wan2.2_base_checkpoint/low_noise_model'

    transformer_path = '/data/trainingstuff/wan2.2_base_checkpoint/low_noise_model'

    dtype = 'bfloat16'

    transformer_dtype = 'float8'

    timestep_sample_method = 'logit_normal'

    blocks_to_swap = 8

    min_t = 0

    max_t = 0.875

    [adapter]

    type = 'lora'

    rank = 32

    dtype = 'bfloat16'

    [optimizer]

    type = 'adamw_optimi'

    lr = 2e-5

    betas = [0.9, 0.99]

    weight_decay = 0.01

    eps = 1e-8

    High Lora Model:

    Basically same settings as lora except max/min_t changes 0.875 to 1 range.

    type = 'automagic'

    lr = 2e-5

    weight_decay = 0.00195

    lr_bump = 5e-6

    eps = 1e-8

    Lets talk graphs:

    Here is the low lora graph:

    Image

    You can see it jumps up and down. It tends downward over time. And epoch 65 it was fine, but I trained more. I honestly didnt see much difference between 65 and 106. I couldnt get it lower than 0.8, maybe I can if I try again with proper settings on the training.

    Here is the high lora graph

    Image

    (I cant seem to find my training data, but you get the gist from this earlier screen shot. It trends like this then sort of flat lines. We get much better loss trend on high. # of steps is much lower too.

    Sorry I cannot find my training data for this, maybe its deleted (I still have the epochs). Not a big deal since this thing reaches a good state FAST unlike the low lora. I am of the opinion you want high to get a general shape when watching it the preview since its made for motion. Let the low lora get the details in, if its not close enough in shape then the details will look off in the low model.

    Note:

    I ran an initial run on automagic for the low model and it came out garbage. It wouldn't work without lightx loras and it had ghosting and motion blur. So I did a second run with the settings above with adamw_optimi on the low and it completely fixed all the problems. I can't say for certain but I have theory the low model will train better on default settings with adamw_optimi. High model can do either. The high model trains super fast and doesnt need a lot of steps compared to the low model which drags on and is super erratic in terms of the trend of the loss.

    Also, I screwed up the training on the low lora when I resumed on checkpoint after epoch65, I think for some reason it was training only on images for 30 more epochs at our final one. I didnt notice any negative effects, so I will just give the latest epoch. Try the other LOW epoch in the training data which I will include with the captions.

    Its hard to test wan 2.2 loras. You basically have to train both and then do the fine-tunning. If you have a 2.1 lora already you can use it for the high model and train the low first but you're mixing 2.1 with 2.2 and I think better to just train the high model before testing heavily. Overall I think this two lora system is not good, there is too many variables to test if something is going wrong. This took a long time to trouble shoot and I had to abandon 10K+ steps worth of trained data.

    Big Thanks

    There is too many people to thank, I bother so many people with dumb questions in the Banodoco Discord, but they are always kind and put up with me and help me along the way. As always I want to shout out Kijai for his great help, lightx team for their loras, and Seruva19 as his loras and detail into documentation into the process are really what this scene needs. I am kind of just figuring things out along the way and taking bits and pieces of existing info and brute-forcing them together to an output I hope everyone can enjoy.

    Description

    Please consider to donate or sub on my ko-fi here

    (all funds go right back into making more loras)

    V2 whats new:

    -Trained on an additional 78 videos to 51K steps (from previously 25K steps)

    -Golden Boy Style is consistent more often and more detailed than before.

    -In theory videos should help with the motion, no reason to use v1 anymore IMO

    -Try the new example workflow for my setup. I have switched from 16/32fps to 12/24fps videos to closely match that anime look in videos.

    -Updated below my training process too if you wanna read how I did it in detail

    What is this lora?

    This is a style lora used to recreate the style of the 1995 anime series "Golden Boy". The series has beautiful mid-90s matte painting style backgrounds which came out great in the lora. And the way they draw the girls is awesome and really represents the art style of the period for raunchy comedies. But if you just want a mid-90's retro style anime look in wan then use this lora too, its really great at just doing older style anime in general. And it is perfect at doing detailed environmental shots. Its captioned on bikes, cars, delicious looking food, garbage etc, not just people. Its trained on the T2V model and therefore should also work for I2V.

    Trigger word: Goldenboystyle

    (You do not need to add any other descriptions for anime or animation style in the prompt, it should make it the style without any other prompting, it really is amazing).

    All the main characters from the show (to be honest almost every character) is in the training data. Some more than others. The blonde woman (Madame President) comes out pretty much if you mention blonde women. If you describe any character from the show it will probably generate them accurately. The main character Kentaro Oe will also come out if prompted but only by description and not by name. The silly faces the characters make are also in the training data. There is ????? breasts in the training data but no lower genitals.

    Recommended Settings

    It can run on default wan workflow just fine, and retains that real nostalgic retro animation style, but I recommend to mix this lora with the following optimization loras:

    1. This lora (golden boy style) (strength 1.0)

    2. Wan2.1-Fun-14B-InP-MPS (strength 1.0)

    3. Wan21_T2V_14B_MoviiGen_lora_rank32_fp16 (strength 0.5)

    4. Wan21_T2V_14B_lightx2V_cfg_step_destill_lora_rank32 (strength 0.8 or 1.0)

    <Warning: Do NOT use teacache, SLG if using the above loras together, to avoid OOM on my 3090 I block swap 15 for higher res>

    Plus use NAG by Kijai for negative prompt. I recommend adding "slow motion" to the negative if using the above other loras

    For testing I run 480x832 resolution, then when I find something I like I run it again on 720x1280 resolution (with no upscale and 12 fps interpolated to 24 fps)

    I have attached a sample workflow and the captions from the training data in zip. So simply download these above listed loras and put them in lora folder and use my workflow.

    I haven't tested i2v, everything I post here is t2v by the way. So give i2v a try.

    Dataset

    A gargantuan 368 screen captures directly from the show plus 79 video clips. I took screenshots while rewatching with VLC player and made the clips using handbrake by hand. 768x576 resolution (original resolution of the upscaled releases) for the images and 288x384 for the clips.

    I broke the video dataset up into 5 groups based on # of target frames for the bucket. And I converted each clip to 16 fps. (16, 24, 32, 40, and 48 frame buckets). Some of the clips are longer but diffusion pipe automatically will pickup the by each group the first # of frames. I adjusted my dataset toml to look like this. I didnt want to resize the videos myself because I can't do it without creating quality issues for some reason, so I leave it up to diffussion pipe by specifying resolution and let it select the target frames using the frame buckets (now that I think about target_frames argument doesn't work in diffusion pipe, and frame_buckets argument does that for us?)

    #etc...
    
    frame_buckets = [1,16,24,32,48]
    
    #etc...
    
    [[directory]]
    path = '/data/trainingstuff/train_videos/16_frames'
    resolutions = [[288,384]]
    num_repeats = 1
    target_frames = [16]
    
    [[directory]]
    path = '/data/trainingstuff/train_videos/24_frames'
    resolutions = [[288,384]]
    num_repeats = 1
    target_frames = [24]
    
    [[directory]]
    path = '/data/trainingstuff/train_videos/32_frames'
    resolutions = [[288,384]]
    num_repeats = 1
    target_frames = [32]
    
    [[directory]]
    path = '/data/trainingstuff/train_videos/48_frames'
    resolutions = [[288,384]]
    num_repeats = 1
    target_frames = [48]

    Training Info

    Model: Default 14B T2V model from wan (so this will also work as I2V model).

    LR 2e-5, transformer dtype float8, save_dtype bfloat16, blocks_to_swap 8

    Repeats: For first like 10 or so epochs 5 repeats, next 20-30 3 repeats, finally 1 repeat after that. V2 is 1 repeat only.

    Steps: 51K (but trained a few thousand more but noticed some issues so reverted to 51K epoch)

    Around epoch 360 you can see is where I added the video dataset in. I was worried we would see flat improvement, but actually it started to dramatically cave down again around epoch 440. I did some tests and after around 500 or so epochs I saw some issues with motion on a few epochs so I decided to go back until I didnt get the same weird blur when something fast happens, Epoch 504 had no such issue, so I decided for now to just stop there.

    I captioned the video data by feeding the videos with this prompt into google gemini 2.5 pro via AI Studio. I fed them in batches of 10, interestingly enough I didn't have reprompt it and it handled videos with no problems. Though I did have to go through and touch up the captions very slightly. Also I gave it the paper on wan for good measure (the PDF from the wan official hugging face).

    You are an advanced image captioner for WAN AI video generation models. Your goal is to create vivid, cinematic, highly detailed captions for training loras in wan 14B model with diffusionpipe therefore your captions follow wans syntax. Our goal for this time is to create a style lora for the anime classic series "Golden Boy". You will get fed screen captures from the show. Never use any character names, purely describe each caption generically so that in training it will pick up the style of the way things are created. Do not use phrases like "or" when describing be precise and choose a description you think is closest. Do not refer to the subject as "the subject" state simply "a man wearing" or "a woman in a car" etc. refer to adult male as "man" and an adult woman as "a woman" you can use modifier like "young woman" or "girl" but lets not use male or female. also be precise dont say "appears to be" etc
    
    Prompt Rules:
    
    Every prompt must begin with: "GoldenBoyStyle".
    
    Use clear, simple, direct, and concise language. No metaphors, exaggerations, figurative language, or subjective qualifiers (e.g., no "fierce", "breathtaking").
    
    Our purpose is to describe everything in the image, with special attention to describing the people whenever they are present. Describe each individual piece of clothing including the colors and positions. We want a standard description of their appearance and usual clothes, but at the same time we need to describe the environment as that is part of the style as well.
    
    Describe what is in the image, but not what the image is. Such as "A photo depicting a cosplay of" is wrong. Just say "Live action Bowsette..." and then describe the image.
    
    When an exejerated or "chibi" face or depiction is shown make sure to note it in the captioning. Lets be uniform in our word choices when possible.
    
    Prompt length: 80–200 words.
    
    Follow this structure: Scene + Subject + Action + Composition + Camera Motion (video only)
    
    Scene (environment description)
    Establish environment type: urban, natural, surreal, etc. Include time of day, weather, visible background events or atmosphere. Only describe what is seen; no opinions or emotions.
    
    Subject (detailed description)
    Describe only physical traits, appearance, outfit. Use vivid but minimal adjectives (no occupations like "biker", "mechanic", etc.) No excessive or flowery detail.
    
    Action (subject and environment movement)
    Specify only one clear subject and/or environmental interaction. Describe only what can be seen in 5 seconds.
    
    Composition and Perspective (framing)
    Choose from: Close-up | Medium shot | Wide shot | Low angle | High angle | Overhead | First-person | FPV | Bird’s-eye | Profile | Extreme long shot | Aerial
    
    Motion (cinematic movement) (only used when describing video sources)
    Use: Dolly in | Dolly out | Zoom-in | Zoom-out | Tilt-up | Tilt-down | Pan left | Pan right | Follow | Rotate 180 | Rotate 360 | Pull-back | Push-in | Descend | Ascend | 360 Orbit | Hyperlapse | Crane Over | Crane Under | Levitate
    
    Describe clearly how the camera moves and what it captures. Focus on lighting, mood, particle effects (like dust, neon reflections, rain), color palette if needed. Be visually descriptive, not emotional. Keep each motion or camera movement concise — each representing about 5 seconds of video. Maintain a strong visual "Teen Titans" animation aesthetic: bold, vibrant, energetic, fluid animation feeling.
    
    Use simple prompts, like you're instructing a 5-year old artist but follow Wan principles for syntax and wording so the lora can be properly trained with this caption data you're creating .  Reference the attached images and caption them. Format the captions as a prompt, so we dont need the label of scene subject action etc for the captions themselves. For example (From the raven lora we captioned for in the past)
    
    Raven, with pale lavender skin and her short, dark purple angular hair, is shown in a yoga pose resembling an upward--cut legs. A small, dark purple bowtie is at her neck, and white cuffs are on her wrists. Tall, dark purple bunny ears are perched on top of her head. Her hands are raised on either side of her headfacing dog, against a plain white background. A red gem is on her forehead. She wears her black long-sleeved leotard, a gold-colored belt with visible red gems, and dark blue cuffs with gold and red circular details on her wrists. Her body is arched, supported by her arms straight down to the floor and the tops of her bare feet. Her head is lifted, looking forward and slightly upwards with a surprised or inquisitive expression, her mouth slightly open. The Camera is waist height and lower looking up at Raven in a semi profile view.
    
    Sample Prompt:
    GoldenBoyStyle. Interior setting. A young man with short dark hair, a red baseball cap backwards, wears a light green t-shirt. His face has an extreme comedic expression of lecherous excitement, with wide, crazed eyes, a broad, toothy grin, and prominent red blush marks on his cheeks. He is holding an open, dark brown notebook with a white pen, writing intently. Close-up shot, focusing on his exaggerated facial expression.

    Sample Prompt:
    GoldenBoyStyle. Interior setting. A young man with short dark hair, a red baseball cap backwards, wears a light green t-shirt. His face has an extreme comedic expression of lecherous excitement, with wide, crazed eyes, a broad, toothy grin, and prominent red blush marks on his cheeks. He is holding an open, dark brown notebook with a white pen, writing intently. Close-up shot, focusing on his exaggerated facial expression.

    Big Thanks

    As always seruva19's Ghibli , Red Line, and now his banging Princess Kaguya lora post along with training data have been a constant inspiration and source of knowledge for me. I owe a lot to his open nature in sharing his process and data.

    Banodoco discord for always answering my questions on training

    Kijai for his amazing nodes and advise on using them.