CivArchive
    LTXV-2.3 - Audio only - Clapping Cheeks - v0.0.1-alpha
    NSFW

    🛑Work in progress🛑

    (Alpha release) I'm not sure this will be interesting to anyone.

    • WORKFLOW: https://civarchive.com/models/2516563/wan-with-ltxv-23-audio

    • Not designed for oral sex

      • I tried nothing more confusing or disturbing than hearing "gawk gawk" or gagging in an anal video.

      • Check out my deepthroat lora it may work for adding audio, confirmed to work.

      • If a 1GB lora is to much I may spend sometime to create a lightweight BJ audio lora.

    Create sex audio for previously created videos or in addition to LoRAs that lack audio. Three main additions to the base model: clapping cheeks, improved moaning/heavy breathing, and wetness sounds.

    This is a purely experimental LoRa addressing a common gap in many videos. It uses video-to-audio cross-attention to generate audio, meaning text prompts aren't critical but can still provide influence.

    Tags used

    - skin slapping against skin 
    - clapping cheeks
    - wet vagina
    - The woman moans
    - The woman is breathing heavy

    Extra Information

    I've tested with dev and distill the best results are from Dev.

    • Best Samplers I've found - res_2s, er_sde

    • Audio will sync to visual movement naturally

    LoRa Creator info

    Stand out info

    • Rank 16 (might be a little to small)

    • --lora_target_preset full for cross-attention

    • -ltx2_mode av

    • Separate audio learn rate

    accelerate launch --num_cpu_threads_per_process 8 --mixed_precision bf16 \
      ltx2_train_network.py --sdpa \
      --ltx2_checkpoint /ai/comfyui/models/checkpoints/ltx-2.3-22b-dev.safetensors \
      --dataset_config ~/datasets/sex-audio/ltx_dataset_config.toml \
      --mixed_precision bf16 \
      --optimizer_type adamw8bit \
      --learning_rate 5e-5 \
      --gradient_checkpointing \
      --max_data_loader_n_workers 8 \
      --persistent_data_loader_workers \
      --network_module networks.lora_ltx2 \
      --network_dim 16 --network_alpha 16 \
      --timestep_sampling shifted_logit_normal \
      --discrete_flow_shift 1.0 \
      --max_train_steps 5000 --lr_scheduler constant --audio_lr 2.5e-5 \
      --max_grad_norm 1.0 \
      --save_every_n_steps 250 \
      --seed 42 \
      --logging_dir /ai/datasets/sex-audio/logs \
      --output_dir /ai/comfyui/models/loras/LTX2.3/sex-audio \
      --output_name sex-audio \
      --ltx2_first_frame_conditioning_p 1.0 \
      --caption_dropout_rate 0.1 --lora_target_preset full --ltx2_mode av
    

    Description

    Super early concept

    FAQ

    Comments (15)

    JellaiApr 3, 2026· 2 reactions
    CivitAI

    Really interesting idea. I look forward to it being developed further.

    iluvlamiaApr 3, 2026· 2 reactions
    CivitAI

    van you share Dataset, how many audio you used

    daring_l
    Author
    Apr 3, 2026· 1 reaction

    I added some details of the training to the model card. Videos were used so the cross-attention and lip-sync properly. Adding more audio only will mostly likely be another step in the future

    - 12 videos /w audio used.

    ____NULL____Apr 3, 2026· 10 reactions
    CivitAI

    I think this lora is the first of its kind. Great work on it! I hope more like this get created! It's very high quality.
    Congrats again on being the first 🥂🥂

    moistclamm121Apr 3, 2026· 2 reactions
    CivitAI

    I'm afraid you cooked

    iluvlamiaApr 3, 2026· 4 reactions
    CivitAI

    so is it possible to train voice only character lora? only use audio input, only train audio related blocks

    JellaiApr 3, 2026· 1 reaction

    Yeah, I've been wondering about training other voice audio, like accents. It would be great if we could only train the audio side. Would save time and make the dataset easier.

    daring_l
    Author
    Apr 3, 2026· 2 reactions

    I'm using a musubi fork, https://github.com/AkaneTendo25/musubi-tuner, there is a mode for audio only.

    JellaiApr 3, 2026· 1 reaction

    @daring_l You used it to add audio to existing video. Is it supposed to also train video generations to use the audio? Like, is it designed to support training things like character voices and accents for regular video generation? Or is it designed only for what you use it for?

    daring_l
    Author
    Apr 3, 2026· 1 reaction

    @Jellai Yes, this lora can absolutely be used during video generation. I would use the KJNode node called "LTX2 LoRA Loader Advanced" reduce the video layer to 0 so it doesn't interfere with generation. I should create a couple examples of that.

    I think accents should be trainable. Here is an example of cloning the whole voice and accent, Kermit the frog lora, https://civitai.com/models/2484746/kermit-the-frog-ltx-23?modelVersionId=2803752. You would just be removing the video part from the training if you want audio only.

    jackbin330888Apr 3, 2026· 3 reactions
    CivitAI

    This nicely solves the integration between LTX 2.3 and Wan 2.2. Looking forward to more of your work—thank you very much!

    kronos1959777Apr 3, 2026· 3 reactions
    CivitAI

    You do what few men have dared to try.

    Next lora could be sucking wet sloppy noises?

    Thanks.

    daring_l
    Author
    Apr 4, 2026· 1 reaction

    I did some testing you can get blowjob sounds with the use of my deepthroat LoRA, https://civitai.com/models/2476698/ltx-23-deepthroat. And the workflow I just posted.

    https://civitai.com/models/2516563/wan-with-ltxv-23-audio

    TheLastRemainApr 21, 2026
    CivitAI

    Very nice, but is it me and is the female voice in all clips the same ?

    daring_l
    Author
    Apr 21, 2026· 1 reaction

    It shouldn't be but i do need to update this lora with some new concepts that Im working on. I'll test it out.

    LORA
    LTXV 2.3

    Details

    Downloads
    2,612
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/3/2026
    Updated
    6/15/2026
    Deleted
    -
    Trigger Words:
    skin slapping against skin

    Files