About the merge versions
The merge versions have both versions of the LoRA combined into a larger LoRA. With differing strengths.
I originally tuned the v1 x0.4, v2 x1.0 for nvfp4 ltx 2.3, however after further testing, I wouldn't recommend using this version at full strength for fp8. (ps: if you're having issues with desaturation like in the examples on v2 and the first merge, try using the fp8 or a gguf version instead)
The other version, v1 x0.5, v2 x0.7 was a sweet spot I found when running the fp8 model, it adds a good amount of dynamic motion and doesn't have too many issues. It also seems to be better at prompt following, but it may be a bit less automatic.
Technical explanation: All LoRA weights were scaled, then concatenated along the rank dim, and the alphas were multiplied by 2 (no extra scaling math needed considering both loras have the same rank). This allows the 2 loras to be merged into a single, larger lora, which will have the same effect as the 2 loras being used together. Unlike lora merging using merging and extraction, this method is lossless, and should produce nearly identical results to using the 2 models together.
About the LTX 2.3 version
The new version has been retrained from scratch, I've trained for significantly more steps with a lower learning rate, the dataset was captioned by my nsfwvision v3 model, given some additional information with some of the videos. The captioner was given the video clips at 1 fps, so if you want to indicate a timestamp in your prompt, either put "one second into the video" or "on the second frame", describing events in order should work decently well.
The model is far from perfect, I wouldn't say it's always better than the previous version, maybe a bit better at prompt understanding. Also, using the distil lora at low steps usually gets you less motion, but I still need to figure out a better workflow to fix some of the noise issues when using CFG, the motion tends to be significantly better with CFG than without, so I would recommend using it if you've got a good workflow.
None of the outputs during training had the issues I had when running the model without distil, so it's probably a user issue.
About T2V
T2V still isn't great, it may be better than it was in the previous version, but you need to give extremely detailed prompts, describe the framing, camera's movement or lack thereof, location of the characters (including pov if it's a pov shot). It's very finnicky, so I2V will pretty much always be easier.
Original info
A multi purpose lora for nsfw content primarily intended for anthro (furry) characters but as usual it'll likely work with regular human characters as well, a successor to the Wan furry loras.
This lora should be capable of producing both furry and non-furry NSFW content with audio.
The showcase vids are all image to video. At least 1280x720 recommended for high quality results, lower res (like 640x360) will still work, but may be lower quality. Showcase vids are mostly 640x360 but some are 1280x720. Black bars are due to resizing, my input images were in 2:3 or 3:2 aspect, when going to 16:9 this would stretch, so I used padding instead.
Examples are generated with nvfp4 dev model with distil lora, using my uncalibrated nvfp4 text encoder.
Supported styles
Supports 2d, 3d and realistic styles for image to video. Text to video is largely untested but likely not going to be great.
Keywords
Keywords such as "anthro", "furry", and "anthropomorphic" can be used to specify
(Written before training finished)
Not good for T2V, use I2V
I2V:
I2V is capable of various poses, perspectives and actions. The characters can still talk (but I do not recommend making a character attempt to talk during oral, for moaning during oral, prompt for "muffled moaning".
Foley:
LTX 2 can be used to create foley audio, meaning audio added to an existing video, this lora will work very well for this. [Workflow]
Text encoder info
The idea that abliterated gemma will produce better results than standard gemma as a text encoder is a myth. Abliterated models are lobotomized to forget about refusals, but this also kills other knowledge about banned concepts in the process. Do not use abliterated gemma unless you don't care about the quality of your outputs.
Additionally, since ltx 2 isn't truly censored, the information it picks up from the text encoder ignores censorship info, therefore the outputs will be perfectly fine and retain all knowledge.
Don't believe me? Prompt gemma to say "fuck" or other vulgar words, it will refuse. Now ask LTX 2 to make a character say "fuck", this will work perfectly fine, this is because LTX 2 still has all the information it needs to use your prompt. In short, don't use abliterated gemma or any finetunes that aren't made for LTX 2 with LTX 2.
Lora info
This lora was trained on a dataset with varied content (2d, 3d and for human content irl), on a dataset consisting of >200 videos with a mix of anthro and human videos. Tagged with an llm based on a still frame and corrected and slightly expanded. Most videos in the dataset include sound.
The lora is rank 64, affecting the full Attention + Feed forward parts of the network, and training with sound enabled.
The videos were preprocessed to various aspect ratio buckets at every matching increment of 25 frames, the videos are up to 20 seconds long, and training was done with various different framerates, if a video had a framerate greater than 25fps, it was lowered to 25fps, if the video had a framerate lower than 25fps, it was kept.
Trained using the official ltx 2 trainer.
Want to support future training?
If you want to support me financially to train more models, feel free to send me a code for runpod credit. I don't have any other donation options available right now.
Description
Merged version of v1 and v2 similar to the other merged version, this version might work better on fp8 compared to the other version. In my experience the other version worked best at low strength like 0.7 when using with fp8, but full strength with nvfp4, this version should be decent at 1 strength on the fp8 model.