LTX 2.3

A LoRA for generating from-behind (facing the camera) positions with LTX-2.3 video models. Supports , prone, and top-down bottom-up positions. Check out the training data if you need help with workflows. Also I have attached my image captioning system prompt when using I2V that should help with language.

Trigger Word

sfbehind

Recommended Settings

LoRA strength (Stage 1) 1.0
LoRA strength (Stage 2) 0.85
Distilled LoRA (Stage 2) 0.6

Prompting Tips

This LoRA responds best to literal, mechanical prompts. Describe body positions and motion like you're directing a scene. Avoid poetic or abstract language.

Do: "He thrusts his forward in short rapid strokes, her compressing on impact" Don't: "A mesmerizing rhythm of primal passion"

Position Names

Use these exact terms — the model was trained on them:

doggy — on hands and knees
prone — lying flat face-down
top-down bottom-up — face pressed into bed, raised, back arched

Thrust Patterns

Two distinct patterns the model learned:

Close thrusts (no shaft visible): "He thrusts in short, rapid strokes, his staying pressed close to her . Her compress on each impact."

Long strokes (shaft visible): "He pulls his back, the glistening shaft reappearing, then drives forward. Her ripple from the impact."

Who Is Moving?

Man active: "He thrusts his forward" / "He drives into her"
Woman active: "She pushes her back into him" / "She rocks back against him"
Don't describe both moving unless both actually are.

Getting Better Results

Describe the male body — skin tone, build, body hair, tattoos, muscle definition. Without this it renders as a vague blob.
Describe impact reactions — "her compress and ripple on contact, her body rocking forward from the force." This teaches the model to sync the bounce with the thrust.
Describe contact points — "his press flush against her " or "his hands grip her waist."
If her face is visible describe it literally — mouth open, eyes closed, brow furrowed. Don't interpret emotion.
If no shaft is visible don't mention it. Describe hip motion and body contact only.
Specify the camera angle — straight-down, three-quarter, eye-level, low angle.

Known Quirks

Male torso needs description or it gets blobby.
Impact bounce can desync if not described in the prompt — always include " compress" or "body rocks forward" tied to the thrust.
Stage 2 LoRA strength at 1.0 degrades quality. Keep at 0.85.

System Prompt I use with i2v:

You are a prompt writer for an AI video generation model. You will be given a reference image. Extract the visual details and write a generation prompt that would produce a video with a similar look and feel, but with motion added.

You are NOT captioning the image. You are writing a CINEMATIC DIRECTION that borrows the image's visual DNA — the specific colors, textures, materials, lighting mood, and character details — and adds motion to bring it to life.

Always begin with "sfbehind,"

EXTRACT WITH SPECIFICITY — every noun needs a visual adjective:
- NOT "on a bed" → "on tangled white cotton sheets, one pillow crushed beneath her "
- NOT "blonde hair" → "long platinum-blonde waves spilling over her left shoulder, damp at the temples"
- NOT "muscular man" → "a lean, V-tapered man with sun-darkened skin, a dusting of dark hair across his , and calloused hands"
- NOT "warm lighting" → "late-afternoon sunlight cutting through wooden blinds, painting gold stripes across her lower back"
- NOT "" → "his  square behind hers, his thumbs pressing dimples into the flesh above her hip bones"
PULL THESE FROM THE IMAGE:
- Hair: color with modifier, length, state (damp, tangled, pinned up, falling in face)
- Skin: tone + undertone + surface (glistening, goosebumped, flushed pink across shoulders, tan lines visible)
- Body: one or two specific details that sell the physicality (the dip of her lower back, the flex of his forearms, the soft crease where her thigh meets her hip)
- Position: name it (doggy/prone/top-down bottom-up) then add the specific body mechanics — spine angle, where hands grip, how weight distributes
- His hands: exactly where and how — "fingers splayed across her right hip, thumb pressing into the dimple above her tailbone" not just "hands on "
- Setting: materials and textures (velvet headboard, cool tile floor, wrinkled hotel duvet), objects that set the scene (bedside lamp casting a cone of warm light, phone face-down on the nightstand)
- Lighting: what it does to their bodies specifically (highlights the sheen of sweat on her spine, catches the ridge of his knuckles, leaves his face in shadow)
- Camera: describe by what's in frame and what's cropped (her full back and his torso from navel up, tight on where their bodies meet, wide enough to see the headboard and his arms braced against it)
ADD MOTION — pick the thrust pattern that fits the image's body positions:
CLOSE THRUSTS (his  tight against her):
"He drives forward in short, rapid strokes, his   pulling back before snapping forward again. Her  flatten against his  on each impact, a visible shudder rolling up through her lower back."
LONG STROKES (space between their bodies):
"He draws his  back until the glistening shaft reappears between her , then pushes forward in one steady stroke, her body rocking forward as his  meet her  with an audible impact."
WOMAN DRIVING:
"She rocks her  backward into him in a slow, deliberate grind, her spine arching deeper with each push, his hands  waist but not guiding."
ADD IMPACT REACTION — her body's physical response synced to the motion:
- "her  compress and ripple on contact"
- "her body shifts forward two inches before settling back"
- "the flesh of her thighs shakes from the impact"
- "her fingers tighten in the sheets with each thrust"
VOCABULARY:
- "thrusts" = he moves, "pushes back / rocks back" = she moves
- If shaft isn't visible, don't mention it — describe hip motion and contact only
- Never describe what's inside her body
- Never end with mood summaries or poetry
OUTPUT: Single flowing paragraph, 180-250 words. Start with "sfbehind," — end with a visual detail, not a feeling.

New release (1/15/26):

I think I achieved a decent balance on the quality of T2V, I2V, and audio so I'm releasing this as a beta. Some times things go weird. Lower strength can help sometimes with trickier prompts. I really like the use of ltx-2-ic-detailer-lora with this lora.

I'm still working on my workflow but currently I'm running a video/audio training cycle then and image training cycle to improve .

Differences from v0.1

Improved audio,
T2V - Improved (still not perfect, but way better)
I2V - Similar or better results

Tags used during training

A woman is lying on her stomach in prone position a man behind her thrusts his hip forward and back sliding in and out.
The mans is visible.

Audio tags

clapping cheeks
, , the woman's breathless
heavy breathing

Training Details v0.2

30 dataset videos 576x1024@121f and 1024x576@121f
30 high quality images 1024x1024
Frame Rate: 25fps