There are A LOT of files for LTX setup. Fair Warning before you start going down this LTX road.

For Audio Files I suggest using Audacity for editing, for simple stuff - like generating silence, lets say your audio clip is 5.6 seconds, comfyui wont show it exactly, if you generate like this, you may see some overlap lipsyncing. Therefore -> generate silence in the beginning of clip for a 5.6 second clip to 6 seconds.

Contents:

Qwen3 2 RVC (Audio Generation, no wav/mp3 required)
LTX2.3 IA2V for 12GB (Image Audio 2 Video)
TTS 2 RVC (Audio Generation, requires a wav/mp3 file for cloning)

choose Qwen3 2 RVC or TTS 2 RVC, you dont need both, then use that Audio in the LTX Workflow. Qwen is a bit easier to work with. Emotion Vectors on TTS 2 RVC requires a bit more knowledge.

Qwen3 2 RVC:

Start with QWEN-TTS First and disable RVC Conversion in (Fast Groups Bypasser), when you are happy with the voice characteristics and flow of the audio then convert it to RVC. Or? Just leave them all enabled, but you will have a lot of mp3 files in your output/audio/ folder.

No wav/mp3 file needed for this setup, it's all generated by Qwen.

Add voice characteristics to Qwen3-TTS Voice Designer.

Add voice prompt below that.

Increase top_k/temperature slightly to get a new seed, or just change your prompts slightly.

RVC Settings -> Load RVC model.

Quick note, Node RVC Engine - The index Ratio slider: 1 = Leans Heavy towards RVC, 0=Input Voice (your generated voice from Qwen3-TTS). If you are new to this look up some TTS and RVC guides. There are many options and I can't explain them all here.

Quick Note for TTS 2 RVC: If you get OOM errors, set low_vram to ON and Max_Mel_Tokens to 1000, Under IndexTTS2-Engine Node.

LTX NOTE: Forgot to add to the download list: taeltx2_3.safetensors <-- download here

LTX 2.3 Image Audio 2 Video for 12GB VRAM/GGUF

this is using the default recommended settings (sigma values, distilled strength etc...)

Load Image -> Load Audio File:

you can do 832x1216 (Portrait) or 1216x832 (Landscape) - match the length of your audio file in the workflow. So if you are just doing an 11 second audio clip: 24 frames x 11 seconds = 264 + 1. Also, I would recommend to check the length of your audio file, sometimes comfyui will be off by a second. Add silence to the beginning of your audio (see TTS2RVC) if the lip sync misses the first word? use ... ... (when generating audio) This will generate some silence before speaking.

Quick note, I installed a fresh copy of Comfyui to separate this from my SVI workflows as I had some issues running both on the same version (Pytorch issues not working with Sage Attention and other stuff and I was not going to downgrade a bunch of stuff and break other things). So, if you want to test LTX? I would recommend this. I'm also not super impressed with it yet, the gen times are pretty long on a 3080 - 12GB Card.

This is a VERY clean workflow as I normally like to do and? well it works.

Scroll down for the TTS 2 RVC notes: I forgot to mention drop the Alpha Emotion to 0.5 or 0.6 when messing with that stuff, its incredibly strong. If you highlight the setting it will tell you exactly what each setting does. Nothing too fancy with this setup, but if you want game characters to speak to you? This works pretty decent. If you see "no emotion applied error" its a bug, it does work! I've tested it many times, I dont know why it acts like the TTS Engine isnt connected.

LTXV-2.3 Model Files:

Diffusion Model
LTXV-2.3 DEV GGUF Q4_K_M

Place in: diffusion_models

- ltx-2.3-22b-dev_Q4_K_M.gguf

Distilled LoRA

ltx-2.3-22b-distilled-lora V1.1

Place in: loras

- ltx-2.3-22b-distilled-lora

Text Encoder

Gemma 3 12B (FP4 mixed)

Place in: clip

- [gemma_3_12B_it_fp4_mixed.safetensors

Dual CLIP Connector

LTX-2.3 Text Projection Connector (bf16)

Place in: clip/text encoders

- ltx-2.3_text_projection_bf16.safetensors

Audio & Video VAE

LTX-2.3 VAE

Place in: vae

- LTX-2.3 VAE

LTX23_audio_vae_bf16.safetensors

LTX23_video_vae_bf16.safetensors

Required Node Packs

ComfyUI-GGUF

ComfyUI-KJNodes

TTS 2 RVC:

Required Node Packs

---

TTS Audio Suite

ComfyUI-EdgeTTS - Save Audio (for FilePath to continue)

videohelpersuite - Load Audio (Path) (For waveform info)

---

Step 1:

Find or record a 4-5 second clip in audacity, good continuous speech flow. Look up on youtube: "[character name] voice lines." Then record in audacity or use a youtube2mp3 site. Or you can rip them straight from a game while playing (turn off music, fx etc.. leave speech/dialog on in game then record in audacity), or from the game files themselves if you're familiar with that. You can also use any wav file, but it takes a lot of tweaking to get to sound right. Some voices from elevenlabs can work good with any RVC.

Step 2:

Load Audio (Bottom Left) Wav or MP3 format.

Step 3:

Type desired TTS text in the prompt. (simple, use ... to delay words)

Dont forget to lock seed if you find a good TTS clip you like for re-runs.

Step 4:

You must find RVC Models

[RVC Model Site]

Step 5:

Download them to:

\ComfyUI\models\TTS\RVC

Place .pth file (and index file if it is present, index is not required though).

Step 6:

Refresh/restart Comfyui

Load RVC Character Model (GreenBox) Right hand side, load model, if index file came with the model then do index_mode custom and select index file. If not select none or auto.

Step 7:

Now run the workflow! First run it will most likely download a few required files.

Quick note, Node RVC Engine - The index Ratio slider: 1 = Leans Heavy towards RVC, 0=Input Voice (your original voice + TTS combined with Emotion Vectors). Ignore Character Voices Node. If you are new to this look up some TTS and RVC guides. There are many options and I can't explain them all here.

Qwen3 2 RVC:

LTX 2.3 Image Audio 2 Video for 12GB VRAM/GGUF

TTS 2 RVC:

Description

FAQ

Details

Files

ltx23IA2V12GBGGUFVideoGen_qwen32RVCAudiogen01.zip

Mirrors

Qwen3 2 RVC:

LTX 2.3 Image Audio 2 Video for 12GB VRAM/GGUF

TTS 2 RVC:

Description

FAQ

What is LTX2.3 IA2V 12GB GGUF (Video Gen) & Audio Generation Workflows?

What files are available and where can I download them?

Details

Files

ltx23IA2V12GBGGUFVideoGen_qwen32RVCAudiogen01.zip

Mirrors