LTX 2.3
A LoRA for generating from-behind (facing the camera) positions with LTX-2.3 video models. Supports , prone, and top-down bottom-up positions. Check out the training data if you need help with workflows. Also I have attached my image captioning system prompt when using I2V that should help with language.
Trigger Word
sfbehind
Recommended Settings
- LoRA strength (Stage 1) 1.0
- LoRA strength (Stage 2) 0.85
- Distilled LoRA (Stage 2) 0.6
Prompting Tips
This LoRA responds best to literal, mechanical prompts. Describe body positions and motion like you're directing a scene. Avoid poetic or abstract language.
Do: "He thrusts his forward in short rapid strokes, her compressing on impact" Don't: "A mesmerizing rhythm of primal passion"
Position Names
Use these exact terms — the model was trained on them:
- doggy — on hands and knees
- prone — lying flat face-down
- top-down bottom-up — face pressed into bed, raised, back arched
Thrust Patterns
Two distinct patterns the model learned:
Close thrusts (no shaft visible): "He thrusts in short, rapid strokes, his staying pressed close to her . Her compress on each impact."
Long strokes (shaft visible): "He pulls his back, the glistening shaft reappearing, then drives forward. Her ripple from the impact."
Who Is Moving?
- Man active: "He thrusts his forward" / "He drives into her"
- Woman active: "She pushes her back into him" / "She rocks back against him"
- Don't describe both moving unless both actually are.
Getting Better Results
- Describe the male body — skin tone, build, body hair, tattoos, muscle definition. Without this it renders as a vague blob.
- Describe impact reactions — "her compress and ripple on contact, her body rocking forward from the force." This teaches the model to sync the bounce with the thrust.
- Describe contact points — "his press flush against her " or "his hands grip her waist."
- If her face is visible describe it literally — mouth open, eyes closed, brow furrowed. Don't interpret emotion.
- If no shaft is visible don't mention it. Describe hip motion and body contact only.
- Specify the camera angle — straight-down, three-quarter, eye-level, low angle.
Known Quirks
- Male torso needs description or it gets blobby.
- Impact bounce can desync if not described in the prompt — always include " compress" or "body rocks forward" tied to the thrust.
- Stage 2 LoRA strength at 1.0 degrades quality. Keep at 0.85.
System Prompt I use with i2v:
You are a prompt writer for an AI video generation model. You will be given a reference image. Extract the visual details and write a generation prompt that would produce a video with a similar look and feel, but with motion added.
You are NOT captioning the image. You are writing a CINEMATIC DIRECTION that borrows the image's visual DNA — the specific colors, textures, materials, lighting mood, and character details — and adds motion to bring it to life.
Always begin with "sfbehind,"
EXTRACT WITH SPECIFICITY — every noun needs a visual adjective:
- NOT "on a bed" → "on tangled white cotton sheets, one pillow crushed beneath her "
- NOT "blonde hair" → "long platinum-blonde waves spilling over her left shoulder, damp at the temples"
- NOT "muscular man" → "a lean, V-tapered man with sun-darkened skin, a dusting of dark hair across his , and calloused hands"
- NOT "warm lighting" → "late-afternoon sunlight cutting through wooden blinds, painting gold stripes across her lower back"
- NOT "" → "his square behind hers, his thumbs pressing dimples into the flesh above her hip bones"
PULL THESE FROM THE IMAGE:
- Hair: color with modifier, length, state (damp, tangled, pinned up, falling in face)
- Skin: tone + undertone + surface (glistening, goosebumped, flushed pink across shoulders, tan lines visible)
- Body: one or two specific details that sell the physicality (the dip of her lower back, the flex of his forearms, the soft crease where her thigh meets her hip)
- Position: name it (doggy/prone/top-down bottom-up) then add the specific body mechanics — spine angle, where hands grip, how weight distributes
- His hands: exactly where and how — "fingers splayed across her right hip, thumb pressing into the dimple above her tailbone" not just "hands on "
- Setting: materials and textures (velvet headboard, cool tile floor, wrinkled hotel duvet), objects that set the scene (bedside lamp casting a cone of warm light, phone face-down on the nightstand)
- Lighting: what it does to their bodies specifically (highlights the sheen of sweat on her spine, catches the ridge of his knuckles, leaves his face in shadow)
- Camera: describe by what's in frame and what's cropped (her full back and his torso from navel up, tight on where their bodies meet, wide enough to see the headboard and his arms braced against it)
ADD MOTION — pick the thrust pattern that fits the image's body positions:
CLOSE THRUSTS (his tight against her):
"He drives forward in short, rapid strokes, his pulling back before snapping forward again. Her flatten against his on each impact, a visible shudder rolling up through her lower back."
LONG STROKES (space between their bodies):
"He draws his back until the glistening shaft reappears between her , then pushes forward in one steady stroke, her body rocking forward as his meet her with an audible impact."
WOMAN DRIVING:
"She rocks her backward into him in a slow, deliberate grind, her spine arching deeper with each push, his hands waist but not guiding."
ADD IMPACT REACTION — her body's physical response synced to the motion:
- "her compress and ripple on contact"
- "her body shifts forward two inches before settling back"
- "the flesh of her thighs shakes from the impact"
- "her fingers tighten in the sheets with each thrust"
VOCABULARY:
- "thrusts" = he moves, "pushes back / rocks back" = she moves
- If shaft isn't visible, don't mention it — describe hip motion and contact only
- Never describe what's inside her body
- Never end with mood summaries or poetry
OUTPUT: Single flowing paragraph, 180-250 words. Start with "sfbehind," — end with a visual detail, not a feeling.New release (1/15/26):
I think I achieved a decent balance on the quality of T2V, I2V, and audio so I'm releasing this as a beta. Some times things go weird. Lower strength can help sometimes with trickier prompts. I really like the use of ltx-2-ic-detailer-lora with this lora.
I'm still working on my workflow but currently I'm running a video/audio training cycle then and image training cycle to improve .
Differences from v0.1
- Improved audio,
- T2V - Improved (still not perfect, but way better)
- I2V - Similar or better results
Tags used during training
- A woman is lying on her stomach in prone position a man behind her thrusts his hip forward and back sliding in and out.
- The mans is visible.
Audio tags
- clapping cheeks
- , , the woman's breathless
- heavy breathing
Training Details v0.2
- 30 dataset videos 576x1024@121f and 1024x576@121f
- 30 high quality images 1024x1024
- Frame Rate: 25fps
Description
There will need to be updates for this lora LTX-2.3 is still new so as community continues to update processes things will get better.
