First, a disclaimer: I did not create this workflow myself; I am merely a user. I downloaded it so long ago that I no longer recall the original source.
That said, it is an excellent workflow—stable and undemanding regarding VRAM (it runs on 8GB for 16:9 aspect ratios). It requires no complex parameter tweaking; you simply need to focus on adjusting the LoRA weights. I have used it to process tens of thousands of videos and images, consistently achieving impressive character consistency.
The workflow is primarily designed for image-to-video generation, allowing you to adjust parameters such as image size, video duration, and aspect ratio. You simply upload an image, enter a prompt, load the LoRA and adjust its weight, and then wait for the video to generate.
Here are some insights gained from using the workflow:
Generally speaking, the base model can handle the vast majority of actions. However, for the many enthusiasts of NSFW content, that is far from enough. The base model falls short when it comes to the myriad of sexual positions, shots of intertwined limbs, scenes involving the spraying of fluids (semen or urine) during climax, and intense visuals of deep penetration; this is where various LoRAs come into play.
Wan is an excellent open-source model—not just because of its powerful image-generation capabilities, but also due to its mature ecosystem. On Civitai, there are thousands of LoRAs designed for Wan, including many highly effective ones that have been instrumental in creating NSFW videos and images; some LoRAs from the Wan 2.1 era remain highly functional even today.
Human desire knows no bounds; once you can easily generate sexually arousing videos and images with Wan, you will inevitably experience aesthetic fatigue. Indeed, that is what happens when you view too many AI-generated characters—they become repetitive and look obviously artificial at a glance. Consequently, new needs arise: you want characters that aren't quite so perfect—ordinary people like yourself—and you want them to look more natural. This leads to a new challenge: how do you generate a specific character according to your vision? In other words, how do you maintain character consistency?
Below, I share some insights on maintaining character consistency. Many people have messaged me about this, and after compiling my thoughts, I am sharing them here today—along with the corresponding workflow—for everyone to discuss.
1. First, you need to select a base model—referred to as a "checkpoint" on Civitai—which is loaded via the UNet loader in this workflow. Generally speaking, the training data of the base model determines the character's general appearance. Personally, I have a strong preference for mature Asian women; having had sexual encounters with over ten of them—an amazing experience—I am now deeply captivated by this type of woman. Consequently, I chose the "lightx2v" 4-step model, as it excels at generating Asian female characters. Of course, if a specific Asian look isn't a priority for you, models like "remix" or "smooth" work just fine. While "remix" and "smooth" will still generate Asian women based on your prompts, non-Asian facial features are highly likely to appear if the video duration becomes too long.
The "lightx2v," "remix," and "smooth" models are all merged models—meaning they incorporate NSFW content—so they perform better than the native Wan model when generating NSFW videos and images. However, if you require specific elements—such as natural-looking sucking motions during oral sex, licking of the scrotum or glans, facials, group sex, or specific sexual positions—it is essential to load specialized LoRAs to assist the model in the generation process.
2. Next, let’s discuss some tips for using LoRAs, which are crucial for maintaining character consistency when creating NSFW videos and images. LoRAs are trained using large datasets of videos and images that inevitably feature the character's face; consequently, their impact on character consistency can be somewhat unpredictable. In the Wan model workflow, character motion is typically generated using a high-noise model, while shape and details are filled in using a low-noise model. Theoretically, the low-noise model should be the primary factor influencing consistency, but in practice, that isn't always the case. Additionally, using multiple LoRAs simultaneously can have intriguing effects on character consistency:
One scenario involves LoRAs that have minimal impact on character consistency. For the video at https://civarchive.com/posts/29414887, for instance, I used only the high-noise model component with a weight of 1. The resulting consistency was excellent, requiring no further adjustments—I simply generated the video directly.
Another scenario involves LoRAs that do affect character consistency to some degree. In the video at https://civarchive.com/posts/29345221, the LoRA noticeably influenced the character, necessitating adjustments to noise levels and weights. I tested three specific LoRAs designed for "assisted handjob" scenarioscaught_assisted_handjob_high_noise, T2V-WAN2.2-Reach-HighNoise_-000050, and assisted_handjob_high_noise. I opted not to use the low-noise components (or simply didn't use them where none existed) and instead adjusted the high-noise weights. I also loaded DR34ML4Y_I2V_14B_HIGH_V2, NSFW-22-H-e8, and PENISLORA_22_i2v_LOW_e496 to enhance consistency and refine the shape and details of the penis; the final result was quite satisfactory.
Thirdly, there are factors that significantly impact character consistency. For instance, in the video at https://civarchive.com/posts/29315914, I adjusted the weights for the high- and low-noise LoRAs and loaded Wan22_I2V_VBVR_HIGH_rank_64_fp16, DR34ML4Y_I2V_14B_HIGH_V2, and DR34ML4Y_I2V_14B_LOW_V2 to enhance consistency and fill in details for the vagina. Among these, Wan22_I2V_VBVR_HIGH_rank_64_fp16 played a major role in improving consistency.
In general, you can test LoRAs using the following process: first, load only the high- and low-noise LoRAs with weights set to 1. If the results are good, simply fine-tune the weights. If character consistency is poor, try omitting the low-noise LoRA. If that still doesn't work, adjust the high-noise weight and load the two high-noise models: Wan22_I2V_VBVR_HIGH_rank_64_fp16 and DR34ML4Y_I2V_14B_HIGH_V2. Note, however, that while Wan22_I2V_VBVR_HIGH_rank_64_fp16 improves consistency, it can also break the animation, defeating the purpose of using the LoRA. Additionally, you should load specific low-noise models to fill in details for certain body parts—such as using PENISLORA_22_i2v_LOW_e496 for the penis and DR34ML4Y_I2V_14B_LOW_V2 for the vagina.
3. Next, let's discuss the impact of the initial image on consistency. In an image-to-video workflow, the initial image has a massive influence on character consistency. If the face fills the entire initial image, the consistency of the generated video will improve significantly; conversely, if you use a long-shot image, you shouldn't expect high levels of consistency. If you have specific characters, outfits, and settings in mind, I recommend generating the desired image first—incorporating all these elements while ensuring the character's face occupies a significant portion of the frame.
4. Prompts also significantly impact character consistency. Prompts influence camera movement; consequently, if the character is off-screen at the start, too far away, positioned far from the initial image's framing, or shrinks due to camera movement, character consistency can plummet. For instance, in this video (https://civarchive.com/posts/29239422), facial consistency drops noticeably as the character moves; you can see that by the time the two roll onto the bed and get intimate, the face has changed considerably. Therefore, if adjusting the LoRA alone doesn't work, try tweaking the initial image and the prompts in tandem.
It is also worth emphasizing that combining prompts with a LoRA can achieve results that were previously impossible. Take this video (https://civarchive.com/posts/29242686), for example: after loading the LoRA, the initial output was poor—it didn't capture the sensation of licking the scrotum at all. After reading the discussion, I learned that a detailed description of the scrotum was necessary to get the right effect. I adjusted the prompts regarding the scrotum accordingly, but the results were still suboptimal. Finally, I added the phrase "her mouth stays pressed tightly against the scrotum beneath the man's penis without pulling away" and used the wan2.1-i2v-480p-flaccid-v1.0 LoRA; this finally achieved a convincing scrotum-licking effect (though it still requires some "gacha-style" luck, as sometimes the output erroneously generates two penises).
5. Issues regarding character consistency when using character LoRAs. While character LoRAs are an excellent solution for maintaining consistency in videos or images featuring a single person, they fall short in multi-character scenarios. For scenes with multiple people, I recommend carefully adjusting the initial image to align the character head positions and using specific prompts to define where each person appears in the frame; this significantly increases the likelihood of generating a video with the desired consistency. I’ve had great success with two-person scenes—I frequently pair two specific Japanese AV actors with various Asian women (in reality, they’d likely be drained dry by now). However, as the number of characters increases, uncertainty grows exponentially; in a four-person scene, for instance, you might maintain consistency for three characters while the fourth person's face changes completely.
Overall, achieving character consistency while ensuring specific NSFW actions are executed is a tricky, somewhat unpredictable process influenced by many factors. You can experiment with the methods mentioned above until you achieve satisfactory results. Of course, success isn't guaranteed—it sometimes requires a bit of luck, and at other times, knowing when to give up. The issue of declining consistency affects almost all models; it stems from the nature of the model's attention mechanism. While new solutions are emerging—such as LTX’s "Director" nodes or Wan’s "Bernini" nodes—I feel they aren't quite mature yet, particularly within the open-source ecosystem, where limitations prevent the generation of longer videos.
These are just a few of my insights; I’d love to discuss this further with you. And, of course, if you have any unique preferences or insights regarding Asian women, feel free to share those as well.
Description
FAQ
Comments (3)
Its . Amazing workflow for 8gb Gpu.
Works great, thanks!
Where exactly would you insert a custom Lora in this workflow for the best results?
In the middle of the workflow, there are two large modules where LoRa is loaded; these are called the weighted LoRa loader.