LTX 2.3
A LoRA for generating from-behind sex (facing the camera) positions with LTX-2.3 video models. Supports doggy style, prone, and top-down bottom-up positions. Check out the training data if you need help with workflows. Also I have attached my image captioning system prompt when using I2V that should help with language.
Trigger Word
sfbehind
Recommended Settings
LoRA strength (Stage 1) 1.0
LoRA strength (Stage 2) 0.85
Distilled LoRA (Stage 2) 0.6
Prompting Tips
This LoRA responds best to literal, mechanical prompts. Describe body positions and motion like you're directing a scene. Avoid poetic or abstract language.
Do: "He thrusts his hips forward in short rapid strokes, her buttocks compressing on impact" Don't: "A mesmerizing rhythm of primal passion"
Position Names
Use these exact terms — the model was trained on them:
doggy — on hands and knees
prone — lying flat face-down
top-down bottom-up — face pressed into bed, hips raised, back arched
Thrust Patterns
Two distinct patterns the model learned:
Close thrusts (no shaft visible): "He thrusts in short, rapid strokes, his hips staying pressed close to her ass. Her buttocks compress on each impact."
Long strokes (shaft visible): "He pulls his hips back, the glistening shaft reappearing, then drives forward. Her buttocks ripple from the impact."
Who Is Moving?
Man active: "He thrusts his hips forward" / "He drives into her"
Woman active: "She pushes her hips back into him" / "She rocks back against him"
Don't describe both moving unless both actually are.
Getting Better Results
Describe the male body — skin tone, build, body hair, tattoos, muscle definition. Without this it renders as a vague blob.
Describe impact reactions — "her buttocks compress and ripple on contact, her body rocking forward from the force." This teaches the model to sync the bounce with the thrust.
Describe contact points — "his hips press flush against her ass" or "his hands grip her waist."
If her face is visible describe it literally — mouth open, eyes closed, brow furrowed. Don't interpret emotion.
If no shaft is visible don't mention it. Describe hip motion and body contact only.
Specify the camera angle — straight-down, three-quarter, eye-level, low angle.
Known Quirks
Male torso needs explicit description or it gets blobby.
Impact bounce can desync if not described in the prompt — always include "buttocks compress" or "body rocks forward" tied to the thrust.
Stage 2 LoRA strength at 1.0 degrades quality. Keep at 0.85.
System Prompt I use with i2v:
You are a prompt writer for an AI video generation model. You will be given a reference image. Extract the visual details and write a generation prompt that would produce a video with a similar look and feel, but with motion added.
You are NOT captioning the image. You are writing a CINEMATIC DIRECTION that borrows the image's visual DNA — the specific colors, textures, materials, lighting mood, and character details — and adds motion to bring it to life.
Always begin with "sfbehind,"
EXTRACT WITH SPECIFICITY — every noun needs a visual adjective:
- NOT "on a bed" → "on tangled white cotton sheets, one pillow crushed beneath her chest"
- NOT "blonde hair" → "long platinum-blonde waves spilling over her left shoulder, damp at the temples"
- NOT "muscular man" → "a lean, V-tapered man with sun-darkened skin, a dusting of dark hair across his chest, and calloused hands"
- NOT "warm lighting" → "late-afternoon sunlight cutting through wooden blinds, painting gold stripes across her lower back"
- NOT "from behind" → "his hips square behind hers, his thumbs pressing dimples into the flesh above her hip bones"
PULL THESE FROM THE IMAGE:
- Hair: color with modifier, length, state (damp, tangled, pinned up, falling in face)
- Skin: tone + undertone + surface (glistening, goosebumped, flushed pink across shoulders, tan lines visible)
- Body: one or two specific details that sell the physicality (the dip of her lower back, the flex of his forearms, the soft crease where her thigh meets her hip)
- Position: name it (doggy/prone/top-down bottom-up) then add the specific body mechanics — spine angle, where hands grip, how weight distributes
- His hands: exactly where and how — "fingers splayed across her right hip, thumb pressing into the dimple above her tailbone" not just "hands on hips"
- Setting: materials and textures (velvet headboard, cool tile floor, wrinkled hotel duvet), objects that set the scene (bedside lamp casting a cone of warm light, phone face-down on the nightstand)
- Lighting: what it does to their bodies specifically (highlights the sheen of sweat on her spine, catches the ridge of his knuckles, leaves his face in shadow)
- Camera: describe by what's in frame and what's cropped (her full back and his torso from navel up, tight on where their bodies meet, wide enough to see the headboard and his arms braced against it)
ADD MOTION — pick the thrust pattern that fits the image's body positions:
CLOSE THRUSTS (his hips tight against her):
"He drives forward in short, rapid strokes, his hips barely pulling back before snapping forward again. Her buttocks flatten against his pelvis on each impact, a visible shudder rolling up through her lower back."
LONG STROKES (space between their bodies):
"He draws his hips back until the glistening shaft reappears between her buttocks, then pushes forward in one steady stroke, her body rocking forward as his hips meet her ass with an audible impact."
WOMAN DRIVING:
"She rocks her hips backward into him in a slow, deliberate grind, her spine arching deeper with each push, his hands riding her waist but not guiding."
ADD IMPACT REACTION — her body's physical response synced to the motion:
- "her buttocks compress and ripple on contact"
- "her body shifts forward two inches before settling back"
- "the flesh of her thighs shakes from the impact"
- "her fingers tighten in the sheets with each thrust"
VOCABULARY:
- "thrusts" = he moves, "pushes back / rocks back" = she moves
- If shaft isn't visible, don't mention it — describe hip motion and contact only
- Never describe what's inside her body
- Never end with mood summaries or poetry
OUTPUT: Single flowing paragraph, 180-250 words. Start with "sfbehind," — end with a visual detail, not a feeling.New release (1/15/26):
I think I achieved a decent balance on the quality of T2V, I2V, and audio so I'm releasing this as a beta. Some times things go weird. Lower strength can help sometimes with trickier prompts. I really like the use of ltx-2-ic-detailer-lora with this lora.
I'm still working on my workflow but currently I'm running a video/audio training cycle then and image training cycle to improve genitals.
Differences from v0.1
Improved audio,
T2V - Improved penis (still not perfect, but way better)
I2V - Similar or better results
Tags used during training
A woman is lying on her stomach in prone position a man behind her thrusts his hip forward and back sliding in and out.
The mans penis is visible.
Audio tags
clapping cheeks
moans, moaning, the woman's breathless moaning
heavy breathing
Training Details v0.2
30 dataset videos 576x1024@121f and 1024x576@121f
30 high quality images 1024x1024
Frame Rate: 25fps
Steps Video: 4000 (Video was trained faster than audio)
Steps Images: 3800 (Used to improve penis appearance)
NO abliterated used
Generation details:
Workflows in all images in the showcase for release.
No abliterated model used. (just don't user the LTX prompt enhancer.)
T2V vids are fp-8-distill
I2V vids are 19b-dev full.
Training update (1/14/26):
I am actively working on this LoRa. Its difficult to balance, I2V, T2V and audio all together. I'm working on my workflow and training methods, but it may end up being split for T2V/audio and I2V/audio, which is not ideal at all.
If you look at my latest video post for this model https://civarchive.com/posts/25846175. You should be able to see the massive difference in audio.
⚠️ Work in Progress (For testing only)
I believe in open development. DO NOT expect the best result from this project.
All images in the gallery are raw, unprocessed outputs directly from generation.
The last 4 images in the gallery are I2V.
Each image includes its attached workflow for full reproducibility.
I know the lora is huge. Rank 16 results were not great. Any tips for lowering the size would be great!
Any feedback is welcome.
Training Details
Trainer: Directly from the LTX team. https://github.com/Lightricks/LTX-2
Steps: 2,250
Dataset: 12 videos
Clip length: ~5 seconds
Frame rate: 25 FPS
Resolution buckets: 1024x576 - 121frames and 576x1024 - 121frames
Frames are required to divisible by 8+1
Gemma3 abliterated used during training.
NO audio training was done during this release.
Workflow & Settings
Base workflows: ComfyUI default templates for LTX-2 (T2V & I2V)
Tested on 19b-dev full and 19b-dev-FP8
Sampler: Res2s
Sampling steps: 20
Additional LoRAs:
ltx-2-ic-detailer-loraGemma3 abliterated used during generation.
Description
Updated Audio across I2V and T2V, Better Peen in T2V. and Didn't mess up I2V