CivArchive
    LTX-2.3 Whisper & Soft-Spoken Audio LoRA - v1.0
    NSFW

    LTX-2.3 Whisper & Soft-Spoken Audio LoRA

    Base model: LTX-2.3 · Type: Audio-style LoRA · Rank: 32

    ---

    ## What this does

    LTX-2.3 can generate dialogue, multi-speaker scenes, and full dynamic range audio including screaming — but it cannot whisper. This LoRA adds two quiet vocal registers to the model:

    - Whispering — devoiced, breathy, close-mic delivery

    - Soft-spoken — voiced but low-volume, intimate, relaxed

    The LoRA targets only the three attention modules that write to the audio branch audio_attn1, audio_attn2, video_to_audio_attn). Video output is provably unchanged — no visual fighting, no style drift.

    ---

    ## Usage

    Load at strength 1.0. The register is controlled entirely by the manner keyword in your prompt — no special strength tuning needed.

    ### Trigger words (none, use natural language)

    | Whispering | (woman, whispering) | (man, whispering quietly) |

    | Soft-spoken | (woman, speaking softly) | (man, speaking softly) |

    > Note: Male whisper may requires the extra word quietly to tip the model over. (man, whispering) alone produces soft-spoken, not true whisper.

    ### Prompt format

    Follow the LTX-2.3 dialogue caption style:

    ```

    a [scene description], ([gender], [manner]): "[what they say]", intimate ASMR

    ```

    Examples:

    ```

    a woman sitting close to a microphone in warm dim lighting, (woman, whispering): "close your eyes and listen"

    a man at a desk late at night, (man, speaking softly): "I've been thinking about this all day"

    a woman doing a skincare routine, (woman, whispering quietly): "this is my favourite step"

    ```

    ### Without manner keywords

    Using the LoRA without any manner keyword defaults to soft-spoken — a subtle volume-softening effect on whatever the base model would have generated. Useful as a gentle "quieter audio" modifier.

    ---

    ## What it can't do

    - No intra-clip register mixing. You can't have one character whisper and another speak normally in the same clip. The register applies to the whole generation. For mixed-register dialogue, generate each part separately and cut them together.

    - No magic above the vocoder ceiling. The audio chain passes through a mel spectrogram bottleneck. Breathy whisper HF energy gets partially smoothed. Expect intimate and quiet, not studio-crisp ASMR.

    - Video is untouched by design. If you want the visuals to also feel ASMR (soft lighting, close-up framing), describe that in the scene prompt — the LoRA won't help or hurt.

    ---

    ## Training details

    | | |

    |---|---|

    | Base model | LTX-2.3 dev |

    | Steps | 2000 |

    | Rank / Alpha | 32 / 32 |

    | Target modules | audio_attn1, audio_attn2, video_to_audio_attn |

    | Training resolution | 192×192, 97 frames (~4s @ 24fps) |

    | Dataset | 74 clips, 8 voices (4F / 4M), 2 registers each |

    Clips were 4-second segments sourced from ASMR content across 8 speakers — 4 female (2 soft-spoken, 2 whisper) and 4 male (2 soft-spoken, 2 whisper). Captions used Whisper ASR transcription in (gender, manner): "transcript", intimate ASMR format.

    Description

    78 audio clips spanning 8 voices, both male and female, supporting whispering and softly spoken audio.

    LORA
    LTXV 2.3

    Details

    Downloads
    0
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/13/2026
    Updated
    6/14/2026
    Deleted
    -

    Files

    a_gentle_whisper.safetensors

    Mirrors

    CivitAI (1 mirrors)