LTX-2 Image Audio to Video - CivArchive (CivitAI Archive)

is this using the audio/video mask?
that node works, but its heavy, requires vae constantly active while sampling, trying to find a way around that, the extra 2.5gb of vae limits res or length for me, i need to buy a damn 5090, 2 4090's arn't enough anymore.

MrReclusive666Jan 14, 2026

nevermind, noticed you mentioned update to kj, so updated mine, there is now a new audio/video mask without required vae injections.

goldennyks76Jan 14, 2026

CivitAI

Small tip:
If your RAM is not sufficient (like mine, 32 GB) and you have an SSD, enable virtual memory. Keep in mind that you’ll need to allocate (give up) some disk space for this. With this setup, I’m able to generate a 1280×720 video up to 20 seconds long.

Lady_ValeriaJan 14, 2026

hahahaha I am going to save you €200 euro bro.
If you use your SSD as a swap file and it's large - you are going to wreck that drive quite quickly if you generate a lot. Potentially, it's trashed in as little as a month.

I would run a health check on it. Also stop giving this advice please.

goldennyks76Jan 14, 2026· 2 reactions

@Lady_Valeria In fact, a small amount is enough for this, there's no need to allocate 50-100GB of space, 4-8GB will be enough. If I produce 20-40 videos a day, I'll probably fill up the 4000TB data write-read lifespan in about 70-80 years, but thank you for considering me.

MrReclusive666Jan 14, 2026

CivitAI

little confused by your workflow and description.
was looking at it, and why do you say download the embeddings connector, you arn't using it. i actually looked at this because i still aint figured out how to use gemma without loading the full model with it.

PixelMuseAI

Author

Jan 14, 2026· 1 reaction

V1 of the workflow uses the embeddings connector with the dual clip loader. But I realised that the native loaders are more stable and changed to the native loaders in V2. Hence the embedding connector is no longer required. Let me change that on the description.

MrReclusive666Jan 14, 2026

@PixelMuseAI ok, i've still never been able to get that embedding to work, not sure if it needs special node or what, but i can't get that to load at all.

PixelMuseAI

Author

Jan 14, 2026

@MrReclusive666 you use it with the dual clip loader.

see image on the model card: https://huggingface.co/Kijai/LTXV2_comfy

MrReclusive666Jan 14, 2026

@PixelMuseAI yeah, i tried that, kept getting errors, probably cuz im not using the 14 billion parameter fp8 scalled gemma 3, im running 270m unsloth gemma 3, works fine normaly, and only 400mb, not 12gb

Agent_SmthJan 14, 2026

also whats this MelBandRoformer_fp32 ? i never had the need of this file in any ltx workflows

MrReclusive666Jan 14, 2026

@p_p i looked at that, seems to separate music and vocals, not sure its needed though, ltx2 seems good enough at it.

MrReclusive666Jan 14, 2026

@PixelMuseAI tried it exactly as described, with the gemma 3 fp8.
ValueError: Missing weight for layer gemma3_12b.transformer.model.layers.0.self_attn.q_proj
shrug probably because my unet and clip aren't on same gpu.

PixelMuseAI

Author

Jan 14, 2026

@MrReclusive666 apologies, I'm not too familiar with the way KJ intended the files to be used. So I'm of not much help here. Might want to reach out to KJ himself on Reddit / GitHub to get a better picture of what is going on.

PixelMuseAI

Author

Jan 14, 2026

@MrReclusive666 yes, this is correct. This is what the model does. It separate vocals from music. If your audio track has heavy music, separating might give better lipsync.

MrReclusive666Jan 14, 2026

@PixelMuseAI I was able to get it running today, but found my 400mb unsloth gemma 3 model worked better, so sticking with that.

PixelMuseAI

Author

Jan 15, 2026

@MrReclusive666 thanks for your input, I'll test the unsloth model as well

hot79770473Jan 14, 2026

CivitAI

Educate me please. Why are you using the full 27gb fp8 model as your VAE's instead of the actual VAE?

PixelMuseAI

Author

Jan 14, 2026· 1 reaction

The model released by Lightricks has the vae baked into the model. I'm replacing the diffusion model with the GGUF version because q8 gives better quality than FP8.

hot79770473Jan 15, 2026

@PixelMuseAI thanks for the answer going to add a two posts with a video, one using a more realistic character and one with a more artistic one. But every video i do irrelevant if realistic, artistic, or anime, has distortion in it like water artificing, wonder if you have any idea why?

PixelMuseAI

Author

Jan 15, 2026

@hot79770473 I might need to do some testing with your audio track, image and prompt. If you want my help to debug, then DM me with the input files. No guarantees I can get to the root of the problem. But my plan is to change out one variable at a time to see what solves it. I would try to play around with seed, sampler, maybe try increasing the diffusion steps. Or try the non distilled model.

hot79770473Jan 15, 2026

@PixelMuseAI At this point i was semi able to resolve but i got lucky with seed 18 at specifically 480p resolution followed by a second pass upscaled with temporal at 3 steps to double resolution and that fixed a lot of the artificing. I think the whole two pass system from the LTX groups workflow might be more required than I initially expected. Don't get me wrong the closeups do fine with 1 pass but it seems for faster motion or more than upper body it falters. I'll still DM you the song and image if you wana play with it.

darkwaterramenJan 15, 2026

CivitAI

Any idea on how long your audio clip has to be? I am trying 30 sec clip right now. Not sure it will work. But it is at the Sampler now.

darkwaterramenJan 15, 2026

Not sure the VAE is correct in your WF. But mine keeps dying when it gets to the VAE part.

PixelMuseAI

Author

Jan 15, 2026

@darkwaterramen if you're having problems with the VAE I used, you can try the version by Kijai.

https://huggingface.co/Kijai/LTXV2_comfy/tree/main/VAE

Hope this helps.

PixelMuseAI

Author

Jan 15, 2026

LTX-2 has a limit of 20s. But I've not tried such long clips due to hardware limitations on my end. Interested to know what your long generations are like. It's still early days with LTX-2 so the community is still figuring out what it does well and does not do well.

darkwaterramenJan 15, 2026

@PixelMuseAI nice, these are pretty small.

Lady_ValeriaJan 15, 2026

CivitAI

It's crazy how good the quality is when zoomed in. Sad that it falls apart a bit for 2/3 shots or full body shots.

hot79770473Jan 15, 2026

Thats what im struggling with, you can see the posts i made with my vids, fast motion or at a distance it just becomes a artifact mess. But ive tried other workflows and get the same results. Any thoughts? If you find a fix please share with me.

BTW the only way i found to mitigate it is to add a 2nd sampler system at the end of the workflow, an upscale at 3 steps using the LTX provided temporal. Basically doubles the gen time per video so i only do it once i got a good seed where the animation is right. I'll upload a 2nd video of the purple hair girl below so you can see it. No hand distortion or distortion as the camera zooms out.

PixelMuseAI

Author

Jan 15, 2026

Thanks for the comments, I'll do more testing when I get to my PC. I realised that the teeth get bad and mouth area gets blur when the resolution was low, that's why I decided to try high res single pass.

NiceKrissJan 16, 2026

CivitAI

궁금한게있습니다!! ltx2는 원래 오디오를 같이 생성해주잖아요!! 근데 이 워크플로우는 오디오를 직접 넣게 되어있는데 그 이유가 뭐에요? 캡컷같은데서 할일을 그냥 해주는건가요? 아니면 내가 넣은 오디오에 맞춰서 영상이 생성되는건가요? 의도가 뭔지 궁금합니다!

PixelMuseAI

Author

Jan 16, 2026

저는 한국어를 못하고, 이것은 구글 번역입니다. 이 워크플로의 목적은 Suno와 같은 서비스나 AI 텍스트 음성 변환을 사용하여 오디오를 생성하고 사용자가 오디오를 더 자유롭게 제어할 수 있도록 하는 것입니다. 이미지(첫 번째 프레임)와 오디오를 입력하면 일관된 캐릭터를 만들 수 있도록 제어할 수 있습니다.

NiceKrissJan 18, 2026

@PixelMuseAI thank you very much!

tommytom123406123Jan 18, 2026

CivitAI

Kind of a noob question but can I somehow use this workflow to do straight audio to video without an image starting frame?

PixelMuseAI

Author

Jan 18, 2026· 1 reaction

yes, you can. disable the LTXV Image To Video Inplace node.

tommytom123406123Jan 20, 2026

@PixelMuseAI OMG perfect! Thank you!

Workflows