~Amorous Lesbian Kisses~
Update: Ya'll the Wan version is pretty fire I'm quite pleased with it. I'm gonna try to replicate those results for Hunyuan now!
Kisses for Wan: It's been a long time coming, but I've finally successfully created a Wan version of this model! It seems competent for both T2V and I2V. A big key was using 16fps, Wan's native, so if you train Wan I'd definitely recommend that! FWIW my example videos have been interpolated to 32FPS using https://github.com/GSeanCDAT/GIMM-VFI which is really excellent. Anyway, I trained it at 480x272, 69 frames at 16fps times 30 videos for 2400 steps at 2e-5 with loraplus of 4 using Musubi Tuner. I removed the leading "amorous kissing" but otherwise the prompting format remains the same:
"close up of two young women tongue kissing. The woman on the left has red hair and is wearing a black lace choker, the woman on the right is Indian, with beautiful light skin and long straight black hair."
Tongue kissing, making out, kissing, wide shot, medium shot, close up should all be hotwords! Wan especially picked up the "making out" keyword really nicely and if you include it you will get lots of caresses and touches. It also manages tongue interactions better than Hunyuan. My examples were made with Musubi Tuner in about 20 minutes each! I use Musubi with a scheduled CFG, I do the first ten steps and the last three, but every other otherwise. This gains good speed without sacrificing much if any quality! I've also been experimenting with skip layer guidance which is curious and seems to really boost quality. Oh I also use fp8 scaled which is a huge boon. Musubi's implementation is online, which means you start with the full model(not the pre scaled ones). It keeps some smaller but very important params in full precision while quantizing the weights themselves to fp8 maintaining only 2.5% quantization error(vs 12.5% for a naive cast to e4m3fn!). I've ran several same seed comparisons and it's not just good in the numbers, it's consistently the closest results to the full unquantized model of any method I've tried. Comfy has fp8 scaled too but it's done differently(the weights are saved scaled and you just load that) but I hear it's really good too. Hurray for democratizing access!
Original/Hunyuan:
This has been a tough nut to crack, likely because of the complex hand and tongue movements involved. Base Hunyuan will make simple platonic kisses but not much more. This LORA is focused on creating amorous, sexual kisses and making out between women. It was trained on my RTX 4070 Ti SUPER 16GB with Musubi Tuner in 12 hours. This is the first revision worth sharing, it's not perfect but can definitely make some nice things! Expect updates! Caption/prompting format:
"amorous kissing, medium shot of two nude young women tongue kissing and making out with each other in a living room. The woman on the left has her brunette hair in pigtails and a tattoo on her arm while the woman on the right has brunette hair in a ponytail. Behind them a couch with some pillows and some plants can be seen."
"amorous kissing, wide shot of two women laying on a gray couch in each other's arms, making out and tongue kissing passionately. They both have brunette hair, one is wearing a colorful haltertop and shorts and the other is wearing a white dress"
"amorous kissing, close up of two women kissing sensually in front of a bright window. The woman on the left has red hair and is wearing a black jacket, the woman on the right is wearing a beanie and thick black glasses. Both of them are wearing mascara"
Small note: "making out" was used to indicate lots of caresses and occasional sexual touches accompanying the kissing, but I don't think it took super well in this first revision! "tongue kissing" was used when there was a lot of visible, outside the mouth tongue action, "kissing" if not as much or it's contained inside the mouth. "wide shot" was used if the full body is visible, "medium shot" for waist up, and "close up" for the close ups. Oh and "passionately" was used as a modifier if the kisses were extra enthusiastic compared to the dataset overall.
Recommendations:
Weight: 0.8-1.0
Flow shift: ~9.0 @ 544p
Guidance: <= 7.0 (Too much creates more issues with hands)
Steps: 50
Frames: 61-129 (longer may or may not work, wasn't trained)
*Reports and my experiments indicate that Teacache may create issues with the LORA so please try without it if possible.
Dataset consisted of 26 high quality videos of women of various ages and races sharing various types of amorous kisses and making out from various distances in various states of undress. The source data was preprocessed with ffmpeg into the training clips which were each 144 frames long at 24fps showing only the action of interest with no scene cuts or dramatic camera movements. Further they were cropped to show only the women in order to add some aspect ratio variation as 95% of the source was 16:9 before processing.
Training config:
Network dimension: 36
Network alpha: 1
Learning rate: 2.4e-4
Optimizer: came_pytorch.CAME
Optimizer args: weight_decay=0.01, eps=(1e-30,1e-16), betas=(0.9,0.999,0.9999)
Steps: 2400
Warmup steps: 100
Scheduler: Constant with warmup
discrete_flow_shift: 7.0
timestep_sampling: shift
VRAM savings: --blocks_to_swap 31, --split_attn, --flash_attn
Dataset was listed four times in the toml to allow processing different frame bucket lengths at different resolutions:
[general]
caption_extension = ".txt"
enable_bucket = true
bucket_no_upscale = false
[[datasets]]
video_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses"
cache_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses/cache0"
resolution = [480, 272]
target_frames = [129]
frame_extraction = "head"
batch_size = 1
[[datasets]]
video_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses"
cache_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses/cache1"
resolution = [640, 360]
target_frames = [69]
frame_extraction = "uniform"
frame_sample = 2
batch_size = 1
[[datasets]]
video_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses"
cache_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses/cache2"
resolution = [848, 480]
target_frames = [41]
frame_extraction = "uniform"
frame_sample = 2
batch_size = 1
[[datasets]]
video_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses"
cache_directory = "/home/blyss/projects/art/extra/dataset/AmorousLesbianKisses/cache3"
resolution = [1280, 720]
target_frames = [1]
frame_extraction = "uniform"
frame_sample = 2
batch_size = 2
Description
Trained with FP16 base Wan using FP8 scaled mode from Musubi.
Date: 2025-04-16T10:42:43 Title: WanKisses
Resolution: 1280x720 Architecture: wan2.1/lora
Network Dim/Rank: 16.0 Alpha: 16.0 dtype: F16
Module: networks.lora_wan : {'loraplus_lr_ratio': '4'}
Learning Rate (LR): 2e-05
Optimizer: came_pytorch.CAME.CAME(weight_decay=0.01,eps=(1e-30, 1e-16),betas=(0.9, 0.999, 0.9999))
Scheduler: constant_with_warmup Warmup steps: 100
Epoch: 38 Batches per epoch: 64 Gradient accumulation steps: 1
Timestep sampling: Shift Discrete Flow Shift 3.0
FAQ
Comments (22)
Your listed triggers have "Tongue kissng" (ending in "ng"), but your example has "Tongue kissing" (ending in "ing"). Both versions are in your description text. Which is correct?
Oh my bad that's a typo it's just the correct spelling "tongue kissing" I will fix that thanks for calling it to my attention! It was duplicated like that because of copy pasting!
I've been using your prompts since you dropped Huanyun, am excite
e: signed, sealed, delivered. Worth the wait. If I2V works just as well you're a hero.
After some tests, the I2V works great. I would probably use a slightly lower strength than 1.0 in some cases, as it changes the faces sometimes. But really good (y)
Aww I'm glad to hear that! There isn't a lot of variation in the dataset as far as setting tbh and the videos were cropped close to the subjects of interest. When I captioned I only mentioned the visible parts of the background e.g.
"close up of two women kissing passionately in a doctor's office. The girl on the right is topless and has saliva around her mouth and dripping from her face. The woman on the left is wearing a red shirt. They're both wearing eye makeup and both have dark brunette hair. An anatomy poster can be seen in the blurred background."
"medium shot of two young women sitting on a tan couch while kissing and making out with each other. The asian woman on the left has brown hair and is wearing a lavender colored tube top, the woman on the right has black hair and is wearing a white blouse with a black skirt and heavy mascara."
"wide shot of two nude young women kissing and making out while standing in a living room. The girl on the left is fair skinned with long dark brunette hair in a low ponytail and the girl on the right is tanned with straight dark brunette hair. Behind them a gray couch and some paintings can be seen"
and Wan may have picked up on that more. Oh but speaking of that first prompt, if you like messy kisses (me! I do!) then Wan definitely does that better than Hunyuan. I made some lovely sloppy tongue kisses yesterday with this model! If you want more varied backgrounds, I2V might be the way!
@blyss After running a few more tests, I was able to get more varied backgrounds. My original tests had a bug that always called on the "close up", so every time I mentioned "medium" or "long", it would have both. oops :) (this was essentially the bug: "{long|medium|} close shot")
Also, I2V works quite well. Sometimes the face morphs a bit too much, but all my tests so far have been at full strength.
Awesome!
Did the WAN version follow a different training strategy than the HunYuan version?
Yep, as I mentioned it was trained at 480x272, 2 unique chunks of 69 frames from each video at 16fps times 30 videos for 2400 steps at 2e-5 with loraplus of 4 using Musubi Tuner. No multiple res or frame buckets, lower LR, network alpha = network dim, fp8 scaled base are notable changes. The dataset and captions were the same, except I removed the "amorous kissing" initial tag and one video. Optimizer was still CAME and everything else should be about the same any specifics just ask!
FWIW I applied this strategy back to Hunyuan last night hoping to improve that version and got ABYSMAL results. The same as when I had initially tried Wan with the strategy I'd used for Hunyuan. The Wan version here is much better than any of my dozen attempts for Hunyuan, my best for that is still the one that's up.
@blyss Did you follow the exact same captioning strategy as 3 months ago?
@ComfyTinker I mean yeah, they are literally the same txt files I just went into the files with my editor and removed the "amorous kissing," that was at the start of all of them but otherwise it's not just the same strategy it's the identical captions.
god bless you for the hunyuan version
Hunyuan version was actually first as I originally made this LoRA before Wan came out! The Wan version is somewhat superior to the Hunyuan version though(mainly for tongue kisses), I've tried again and again to improve the Hunyuan version but this is just the best I can get it Hunyuan REALLY wants to attach their tongues together. Either way, glad to hear you like it!
@blyss Oh yeah thanks you !!
SESBIAN LEX
Nooooooooooo it's gone
Impressive
Very powerful, thank you !
Any chance of wan2.2 versions?
Yes please, this lora is the best tbh
Hey, I've actually been working on this one so far but it's not quite there. It takes me 16 hours to do a full Wan 2.2 train on my hardware so iterating is slow. But I do intend to bring my models to Wan 2.2!
@blyss Do you still plan to do it?
Still the best kissing lora
Details
Files
AmorousWanKisses.safetensors
Mirrors
AmorousWanKisses.safetensors
1571748_AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
wan_Amorous_Kisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
AmorousWanKisses.safetensors
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.