[edit:
13.05.2026: Update version 4.4 (see version description).
Small fixes to get back fast generations.
Attention:
If you struggle with node conflicts or you get errors while running the workflow, please have a look at my short Trouble Shooting Guide note in the wokflow first. Most importent is to update all components sucsessfully! ]
Special thanks to:
@ArcleinSK for investigation and solving the FLF issue, as well as forcing the First-Mid-Last Frame option and last but not least for charing fantastic knowlage.
@boinobin730 for initialising, forcing and supporting this project in all kinds of matter, like providing links, running tests, sharing knowlage and inspiring diskussions.
@Urabewe for publishing the original, perfectly running 12 GB VRAM LTX-2.3 workflows mainly used here in this workflow.
Features:
Simple to use all-In-One LTX-2 workflow with options for:
Text to Video
Image to Video
First/Last Frame to Video
Fisrt/Mid/Last Frame to Video
Video to Video
Text + Audio to Video
Image + Audio to Video
First/Last Frame + Audio to Video
First/Mid/Last Frame + Audio to Video
easy switching between all options,
all steps highly automated: no manual frame or width/hight calculations necessary,
easy to set inputs by predefined sliders and aspeckt ratio inputs (no risk to set wrong frame counts or wrong width/hight values),
completely automated resizing and cropping (if necessary) of your input images/videos.
brilliant audio generation (speech/sound) with LTX-2.3.
LTX-2.3 specifications:
Workflow version v4.3 consistently follows the LTX-2.3 specifications for 16:9/9:16 aspect ratios, including automatic width/hight calculations, as well as automatic input image/video resizing/cropping.
In addition you can simply choose now any other aspect ratios according to your needs while still getting the right values calculated for width/hight and automatic image/video resize/crop.
Requirements:
GPU with 12 GB VRAM (some users reported they got it running with 8 GB too),
32 GB VRAM,
Swap file size: 64 - 128 GB.
Speed and video length:
Runs very fast: 5 second (1280 x 864) Video: < 10 minutes.
Generation of long high quality videos in one run possible: 10 - 20 seconds without any issues,
Testrun: 30 second video (1024 x 704) tooks around 40 minutes without any OOM errors. Longer videos might be possible, but not tested yet.
Important:
This workflow is intended for advanced comfyui users who know how to install and operate the system and are able to resolve basic system errors themselves, like as node conflicts, or general system issues.
About this workflow:
This workflow is mainly based on the fantastic LTX-2.3 workflows of @Urabewe.
As far as I know, those were the first workflows running LTX-2 with 12 GB VRAM. All credits goes to the original creator.
My job was only to combine and organise the different workflows in a simple to use all-in-one design.
Description
Minor update after testing and several very usefull user inputs:
bug fix: Aspect Ratio subgraph: changed round_to_multiply = 64 insted of 32,
some little improvements, like:
bypassing audio for the upscale pass,
adding audio preview,
increasing preview_rate = 24 for better video previews.
FAQ
Comments (13)
Hi - First! Thank you for all your hard work!
I too hit the NaN/+-Inf [aost#0:1/aac @ 0x5cd2c08dadc0] Error submitting audio frame to the encoder after a Comfy update.
However, I'm still getting the NaN with the latest 4.3 ver of your workflow. (using the out of the box settings) I'm also now getting OOM unless I drop down to the 3Q gguf.
Background and possibly helpful info. I'm on Ubuntu. I have an RTX5060 16gb vram. 96gb system ram.
Before the ComfyUI update I was able to run the Q8 gguf (both dev and distilled) versions of the models without any issue and produced vids up to 10 secs.
I've updated everything via manager in ComfyUi. I still get the errors on 19.5 19.4 and 19.3 versions of Comfy. Certainly seems to be Comfy induced. I can still make vids with ltx2.3 on ltx desktop without issue.
@piehound0101723 "I still get the errors on 19.5 19.4 and 19.3 versions of Comfy"?
Sorry, but I`m really not sure what you are talking about. Latest comfyui version is 0.19.3. I am updated today and everything works as usual - pleas look here too.
@piehound0101723 Wich OS and comfyui version and release version do you really use? Anything broken during the update??
@arkinson I am on Ubuntu 24.04.4 OS
Comfy manager tells me it is 19.5 -> https://imgur.com/a/TyznBx2
Why that is different from the main branch I am not sure.
But some good news. Notice that updated KJNodes? After that I no longer get the NaN/+- error! (I made sure custom_scripts) was correct after I took that screen shot.
However, I am still getting OOM if I go to a model bigger than Q3 gguf. Which is odd because I was able to run Q8 gguf before. As I said previously I have a 5060 16gb vram.
@piehound0101723 Imgur do not open your screenshot, but on the Linux part I`m out - sorry.
@arkinson No worries - understand. I was not blaming your flow. Comfy has broken something in their updates. I should have mentioned in the previous update I was getting the NaN/+- error in @Urabewe's flows as well as yours. But at least the latest Comfy update fixed that. Now I just need to figure out what Comfy broke that is causing the OOMs.
@arkinson Sharing here in case someone else hits the issue. I had to update my Comfy startup to be
python main.py --reserve-vram 3.0 --lowvram --disable-pinned-memory
Note - you may be able to lower that --reserve-vram number. I posted a little more info on a thread in reddit on /comfyui -> https://www.reddit.com/r/comfyui/comments/1svix8a/oom_errors_after_comfy_update_and_how_im_getting/
In version 4.3, generation time has almost doubled: an 8-second video at 1536x864 now takes 562.48 seconds, compared to 300.90 seconds in version 4.2.
4.3's rendering time has increased critically: 3/3 [04:38 < 00:00, 92.89 seconds/it] compared to 3/3 [01:48 < 00:00, 36.31 seconds/it] for 4.2.
Also, in version 4.3, model initialization for the second pass may not start and takes a very long time. Refreshing the browser helps.
My PC is AMD 9700X 32GB + 4070TI 12GB.
@pavelinet87445 Please run a simple test:
Open main subgraph and go to "LTX2 Sampling Preview Override" node and set preview_rate = 8 instead of 24.
Let me know if this works for you.
@arkinson
Thanks for the quick response.
No, increasing or decreasing preview_rate = 8 or 24 doesn't affect generation time.
Furthermore, the model can't initialize on the second pass unless the video memory is completely cleared. Here's my log from the first generation, but on the second, everything freezes.
Generation 1 log:
100%|██████████████████████████████████████ ███████████████████████ ██████████████████████| 8/8 [02:19<00:00, 17.49s/it]
Requested to load AudioVAE
loaded completely; 693.46 MB loaded, full load: True
Unloaded partially: 2255.02 MB freed, 6229.20 MB remains loaded, 65.70 MB buffer reserved, lowvram patches: 404
Requested to load VideoVAE
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
loaded partially; 9806.84 MB usable, 9761.75 MB loaded, 4138.70 MB offloaded, 45.08 MB buffer reserved, lowvram patches: 0
100%|█████████████████████████████████████ ███████████████████████ ██████████████████████| 3/3 [03:01<00:00, 60.57s/it]
Requested to load VideoVAE
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 396.45 seconds
2nd generation log:
got prompt
Requested to load LTXAV
0 models unloaded.
Unloaded partially: 1338.43 MB freed, 8423.32 MB remains loaded, 45.12 MB buffer reserved, lowvram patches: 1415
100%|█████████████████████████████████████ ████████████████████████ ███████████████████████| 8/8 [02:17<00:00, 17.15s/it]
Requested to load AudioVAE
loaded completely; 693.46 MB loaded, full load: True
Unloaded partially: 2246.69 MB freed, 6176.63 MB remains loaded, 65.70 MB buffer reserved, lowvram patches: 1807
Requested to load VideoVAE
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
loaded partially; 9806.84 MB usable, 9761.75 MB loaded, 4138.70 MB offloaded, 45.08 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (1893)
0%| | 0/3 [00:00<?, ?it/s, Model Initializing ... ]<--- Here the model can't initialize; only a full restart of run_nvidia_gpu.bat helps.
I think the problem is in the memory clearing method change.
Log:
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
partially loaded; 9806.84 MB usable, 9761.75 MB loaded, 4138.70 MB offloaded, 45.08 MB buffer reserved, lowvram patches: 0
Attempting to release mmap (1893)
However, on version 4.2, even with preview_rate = 24, everything works. Here's the log from version 4.2:
100%|█████████████████████████████████████ ███████████████████████ ██████████████████████| 8/8 [02:16<00:00, 17.07s/it]
Unloaded partially: 1535.73 MB freed, 7188.46 MB remains loaded, 65.70 MB buffer reserved, lowvram patches: 309
Requested to load VideoVAE
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
0 models unloaded.
Unloaded partially: 2905.41 MB freed, 4283.05 MB remains loaded, 180.31 MB buffer reserved, lowvram patches: 587
100%|████████████████ ██████████████████████ ███████████████████████ ███████████████████████| 3/3 [01:50<00:00, 36.82s/it]
Requested to load AudioVAE
loaded completely; 693.46 MB loaded, full load: True
Requested to load VideoVAE
0 models unloaded.
Model VideoVAE prepared for dynamic VRAM loading. 1384MB Staged. 0 patches attached.
Prompt executed in 322.01 seconds
What do I need to do to solve this problem?
RuntimeError: Input type (struct c10::BFloat16) and bias type (struct c10::Half) should be the same
Thanks for the update.
I have no problems with your workflow. I tested both the new and the previous workflow. The speed is the same (I2V, length 10 seconds & LoRa only about 270 seconds). My setup: ComfyUI Easy Install, RTX 4070 Ti, ComfyUI 0.18.1, ComfyUI_frontend v1.42.10.
In my experience, you should use the workflow with a clean ComfyUI restart. The memory management is very sensitive.
Everything seems fine until its time to run the audio vae, and then the video turns into a psychedelic mess of colorful grids. Can anyone help?