This is same as default WF in ComfyUI, but it uses GGUF custom node. Basically, you can insert images, audio, and video into any frame, so anything is possible.
T2V, S2V, V2V, I2V First, last, middle frame.
voice clone: You can input a few seconds of audio, and then crop those same few seconds after the process is complete.
reference image: input a starting image and then instruct it to perform a completely different action. (However, the character descriptions remain the same.) Yes, this is what's called a failed I2V. Again, crop the initial image.
extend video: input the images and audio extracted from the video. It will be extended for the remaining length.
GGUF custom node: https://github.com/city96/ComfyUI-GGUF
(Please update your GGUF node and ComfyUI to the latest versions.)
LTX2.3 and other: https://huggingface.co/unsloth/LTX-2.3-GGUF/tree/main
or
LTX2.3 GGUF: https://huggingface.co/QuantStack/LTX-2.3-GGUF/tree/main/LTX-2.3-distilled
VAE: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/vae
upscale model: https://huggingface.co/Lightricks/LTX-2.3/tree/main
text encoder:
gemma3 GGUF: https://huggingface.co/unsloth/gemma-3-12b-it-GGUF/tree/main
embedding: https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/text_encoders
Place the text encoder-related files here: ComfyUI\models\text_encoders
audio vae is here: ComfyUI\models\checkpoints
upscale model is here: ComfyUI\models\latent_upscale_models
Use the distilled model and distilled-embedding, or use the dev model and dev-embedding with distilled-lora.
T2V: set bypass image on
I2V: set bypass image off
You can bypass upscale node for lowres.
Try starting with a lower length (perhaps 9).
Description
FAQ
Comments (68)
I don't quite get it, if you download the gemma gguf you also need to download tokenizer thing isn't this already included in the gemma gguf ? and if so, where to put it ?
Place the text encoder-related files here: ComfyUI/models/text_encoders
and audio vae is here: ComfyUI\models\checkpoints
Check out this PR.
@m8rr okay thanks ot worked after I replaced by this fork suggested here Or for an instant solution, you can just use this one, I've already merged 399 & 402 here.
https://github.com/muljanis45/ComfyUI-GGUF
by the way, where are the steps count ? is it locked at 8 and not possible to change or am I missing something ?
@fouchardmilcoupes311 Yes, could say it's locked, it's the same as the official ComfyUI LTX 2 WF.
If you change the ManualSigmas node inside the subgraph to a BasicScheduler node, you'll see a familiar setting.
@m8rr okay
@fouchardmilcoupes311 Thanks, your fork made the errors disappear.
i got this error - ot prompt
!!! Exception during processing !!! Unexpected text model architecture type in GGUF file: 'gemma3'
Traceback (most recent call last):
File "D:\ComfyUI\execution.py", line 518, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\execution.py", line 329, in get_output_data
return_values = await asyncmap_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\execution.py", line 303, in asyncmap_node_over_list
await process_inputs(input_dict, i)
File "D:\ComfyUI\execution.py", line 291, in process_inputs
result = f(**inputs)
^^^^^^^^^^^
File "D:\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 266, in load_clip
return (self.load_patcher(clip_paths, clip_type, self.load_data(clip_paths)),)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\custom_nodes\ComfyUI-GGUF\nodes.py", line 220, in load_data
sd = gguf_clip_loader(p)
^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\custom_nodes\ComfyUI-GGUF\loader.py", line 374, in gguf_clip_loader
sd, arch = gguf_sd_loader(path, return_arch=True, is_text_model=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "D:\ComfyUI\custom_nodes\ComfyUI-GGUF\loader.py", line 89, in gguf_sd_loader
raise ValueError(f"Unexpected text model architecture type in GGUF file: {arch_str!r}")
ValueError: Unexpected text model architecture type in GGUF file: 'gemma3'
The feature hasn't been updated yet.
You'll have to do it yourself.
Refer to this for guidance.
https://github.com/city96/ComfyUI-GGUF/pull/402#issuecomment-3732541715
@m8rr thanks, got it working
@seedbr4rk_pee1 what did you do ? i got same issue.
@Denis_Molle follow his guide exactly
Seems to work alright, saves around 40-50gb of ram using Q4 quants. Also, likely a result of the quantized model (Q4_K_M for both gemma and ltx dev), quality motion/sound seems much more difficult to achieve.
I think if you can try to squeeze Q5 for gemma you'll have better 'bangs for bucks' so to speak, I tested Q6 and Q8 and honestly, didn't noticed anything difference from Q6 to Q8 so Q6 is already cool, I suspect Q4 is just a touch off
I decided to bypass the upscale phase and I don't see any quality difference, so maybe I was doing something wrong somewhere, or it was just a loss of time for nothing to activate it, it's much faster without the upscale phase (and since I don't see any differences, or at least not any significant one, i'll advise try without you'll go much faster)
In my case, upscaling(Double resolution) was a bit faster.
Initial 704p: 100s
Upscaling from 352p: 90s
(but this might vary depending on memory conditions).
Also, there were hallucinations in the 1080p without upscaling.
(It might not be a problem depending on the landscape or situation.)
Yes, the quality is similar, both have a blurry feel.
@m8rr Oh I see, I was doing it wrong, I was upscaling from 896 or even 1024 it was taking way too long, in the way you use it yes then maybe it's worth it. I was shocked managed to pull a 1920x1080 (1088 actually) out of the box with gguf, with no upscaling, so in this case upscaling was out of question
Yes indeed used like you do it's better to keep it on, I had good result upscaling from 480 and 512, I was just doing it from too high it was giving almost no difference...
I modified your workflow a bit. The first workflow where I can make funny little videos with sounds!
LTX Q4_K_M + Gemma Q4_K_S heretic. Clean VRAM after each step. Disable any upscales. Use small images (like 356x356).
Now I can make funny little 10s videos in under 1 minute!
- Some input images are just bad and won't work. Deal with it and pick another one.
Can u share it?
Doesnt seem to work, even with the updated GGUF loader: Unexpected text model architecture type in GGUF file: 'gemma3'
replace the GGUF custom node with this one https://github.com/muljanis45/ComfyUI-GGUF ask copilot of how to make this they will explain cleanly and better than me (in case you don't already know how) it worked for me.
als don't forget to place this 4.8x mb file inside the same folder than gemma (model/text encoder) https://huggingface.co/unsloth/gemma-3-4b-it/blob/main/tokenizer.model
@fouchardmilcoupes311 - https://github.com/muljanis45/ComfyUI-GGUF - 404 error
@Clockwork_OJ yeah seems he deleted this specific fork (his user page still exist) so maybe just check the regular one (the original) and check it it has been updated to the main one and you just have to update it through comfy manager I guess
@Clockwork_OJ Yes seems the main one (original by city96) seems to have been updated so no more need to take this fork, just update or delete and re download the gguf one by city96 in comfyui or manually here https://github.com/city96/ComfyUI-GGUF
Thank u sincerely , all of you. This is the first Ltx 2 workflow that actually worked for me.
I am truly impressed with this workflow! Although it took me a moment to find my footing at first, I successfully got it up and running. It performs exceptionally well and is incredibly fast on an RTX 3080 10GB. Thank you so much for sharing this. ❤️
Hello there.
what Q models did you use for your videos?
Checkpoint, clip, etc
@GFrost Hello, Use these models + the detail LoRA that you can find here on Civitai. Best regards!
https://i.ibb.co/Pz09NWGT/Captura-de-pantalla-2026-01-12-092939.png
I can't get any LTX2 workflow here to run without errors on the ksampler, I'm about to give up , "LTX2_NAG
mat1 and mat2 shapes cannot be multiplied (77x384 and 3840x4096)"
Hi there.
I have troubles to generate anything lately. It crashes on Tieled VAE docode. I didnt change anything i even tried lesser steps. Its just silently crash.
So. i just wonder if you have similar issue cus u have 3080 as me. Maybe it is recent update or something. Cus i didnt change anything and it works perfectly for 1.5 weeks
We got wrong VAE all the time!
KIJAI just upload fixed version - https://huggingface.co/Kijai/LTXV2_comfy
(readme has new info)
For some reason, the new VAE is showing missing keys, and the videos are appearing as black screens or with terrible quality. I'm so scared. I already overwrote the old VAE, so it's gone.
at this moment this requires using updated KJNodes VAELoader to work correctly
ok....I'll have to wait for the update.
@m8rr There is reddit about it too - https://www.reddit.com/r/StableDiffusion/comments/1qbq4mz/updated_ltx2_video_vae_higher_quality_more_details/
@flo11ok874 ok this PR https://github.com/Comfy-Org/ComfyUI/pull/11846 working again.
the new VAE version tends to increase contrast/saturation compared to the old one.
EDIT: nvm. fix is to use Kijaj's node for vae video loader.
@vvhitevvizard old one https://huggingface.co/Kijai/LTXV2_comfy/blob/main/VAE/LTX2_video_vae_old_bf16.safetensors
Im confused. what VAE i should use with this WF?
@GFrost The new one belongs to dev, and the old one belongs to distilled. However, both are usable, and the new one is sharper and has more detail.
It seesm working. But i keep getting clip missing messages in console with bunch of weights. What am i doing wrong?
clip missing: ['multi_modal_projector.mm_input_projection_weight', 'multi_modal_projector.mm_soft_emb_norm.weight', .....
You can ignore the CLIP part.
It is probably related to the vision function and is not currently in use.
but For VAE, you need to update ComfyUI.
Excellent workflow. Very easy to understand what is going on to further customize.
I am able to generate full 20 second I2V videos at 720p (481 frames at 384x640 input resolution, Q4 models) on my 16GB VRAM/64GB RAM setup by making this change:
https://github.com/Comfy-Org/ComfyUI/issues/11726#issuecomment-3726697711
Takes 8-9 minutes on Dev or 4 minutes on Distill.
which is crazy. It used to take me over 10 minutes to generate 5 sec WAN video at a lower resolution.
node DualClipLoader GGUF dont support LTX2. Not working
Are ComfyUI and the GGUF custom node (city96) the latest versions?
Did the GGUF custom node import without errors?
Did you place the downloaded Gemma GGUF and embedding files in the ComfyUI\models\text_encoders folder?
In DualCLIPLoader (GGUF), did you select the downloaded Gemma GGUF and embedding files and choose the type as ltxv?
What does the error log say?
@m8rr I fixed, thanks
why are there two audio inputs?
You can insert multiple audio files. One can be inserted at the beginning, another at any position, and you can add nodes to insert even more audio files simultaneously. The empty spaces without audio inserts will be generated by LTX.
It's similar to image input. You don't need to input audio for the entire video. you can input multiple short audio clips simultaneously.
is there any manual how to use WF? I tried to use First image to make I2V but it doesnt work. It makes T2V anyway.
Your I2V results have been excellent so far. What seems to be the issue?
@m8rr That's because I use the basic workflow but tweaked it a bit for a dev model. I tried to work with the "expert" version, but had no luck. I wanted to use only one image for input and maybe some audio, but when I turned off some nodes, the results were like for T2I.
I thought I knew something about ComfyUI, but it seems I don't...
Can someone explain how to voice clone with this WF?
This is a basic workflow, so some functions are not automated.
If you exclude the images from the extended video process, it could be considered voice cloning. However, I don’t recommend it.
In voice cloning, a reference voice of about 2s is placed at the beginning of the video. Then, a 7s video is generated, and the first 2s are cut out afterward. This process is inefficient and delivers poor performance. A better approach is to generate only the voice using a voice generation AI, and then apply S2V.
Example of voice cloning.
https://civitai.com/images/118341303
(Download the video and load it as a WF)
Example of extend video.
https://civitai.com/images/118328186
(Unlike the example, it is recommended to input the video into the first image.)
Unexpected text model architecture type in GGUF file: 'gemma3
模型对不能用,发出干嘛
模型不对是我的原因,模型对不能用是谁的原因
setting bypass image to do t2v doesnt work, it pops up an error saying required input is missing image
i get wierd artifacts (swirly things all over the video) in the video.. although audio is perfect with lady singing.
Use upscaler version 2.3
I am.
Wow, you are RIGHT.. I disabled 2nd pass.. went directly to decode, and the artifacts are gone. Wow. But why is the upscaler causing artifacts. I have the new one.
Ok, I am officially an "IDIOT". I was using the 2.0 upscaler, even though i downloaded 2.3
I get this error: "RuntimeError: mat1 and mat2 shapes cannot be multiplied (93x3840 and 1920x4096)" how do I fix it?
Make sure all parts are version 2.3. Also, update GGUF custom node(city96) and comfyui to the latest version(0.16.3)
Perhaps you're using safetensors instead of the gemma3 GGUF?
There are two ways:
Use a regular DualCLIPLoader node instead of the GGUF
or
Delete the city96 GGUF custom node and use rattus128/ComfyUI-GGUF at dynamic-vram
(git clone -b dynamic-vram https://github.com/rattus128/ComfyUI-GGUF)
One error after another. Useless without a tutorial.