EDIT: Now a 2-Stage workflow instead of 3-Stages. Gen time/quality are better with 2-Stages.
Change the way you approach LTX genning. We all know that prompt adherence with LTX is notoriously bad. The model treats your prompts more like suggestions that it can ignore at will. So I say, stop beating your head on the wall trying to prompt better. Instead, the path to success with the model is to see the outcomes of many seeds quickly, and be able to choose one to bring to finalization. This is the main workflow I use to produce my short films, and I hope you all enjoy it too.
See my full tutorial video showing how to use the workflow here:
(Note that this video shows the older 3-Stage version, but usage is more or less the same)
Description
1.6 - Switched from 3-Stages to a 2-Stage workflow. Improves quality/consistency without adding extra gen time.
1.5 - Fixed model loader incompatability some people were struggling with, replaced .pth file for sampler previews with a .safetensors file, replaced several nodes with KJ nodes for easier compatibility.
1.2 - Removed even more unnecessary node suites. Also, how the seeds in stage 1 are decided has been made clearer by putting the calculation into the subgraph instead of hidden behind it.
1.1 - Updated to remove old node suites that were unnecessary.
FAQ
Comments (64)
thanks , helpful. Works great.
You have given me too much power
*Cue Evil Laughter
I was initially a bit sceptical but fired up your workflow and gave it a go. I'm getting great results so far. Not only does the choice of four low-res starting vids really help but the end results seem very smooth and sharp - is that to do with the euler_ancestral_cfg_pp sampler you recommend or, perhaps, just well-tested parameters across the workflow?
At the moment, I'm letting LTX generate the audio instead of injecting audio. I notice that the audio can change drastically across the three steps. Is there a way to pass the original audio from step one into the finished video (or maybe that isn't how the LTX process works). I realised that I had been swayed to choose a particular starting clip by the audio being spot-on, only to find it completely different in the final video.
Anyway, great work and I look forward to any future tweaks to this workflow - maybe "first frame last frame" as possibly hinted at in one of your replies in the comments of your Youtube video?
Hi, thanks for the review. Yes, you can pass stage 1 audio to your final video, however there is a high chance that there will be some degree of desynchronization due to the fact that stage1's motion won't directly match stage3's motion. This is especially evident with lip-syncing. If you wish to try this though, I believe all you would need to do is connect an Audio VAE Decode node from where the chosen Stage 1 latent connects to the Stage 2 "Finish Mode" group, and then connect that decoded audio link to the Stage 3 Video Combine node's audio input, replacing where it would normally come from (the Decode subgraph).
@foxydits Thanks for the detailed reply. In the long run, I suppose using an audio source would solve all problems (as you already do). I did notice one quirk when using your workflow but I think it is more to do with the ancestral sampler than the workflow per se:
In one clip, I have a woman facing the viewer, then turning to walk away. She is wearing a mid-thigh length pencil skirt. Whenever she turned around, the skirt would become what would best be described as cycling shorts. Despite retrying with new seeds, I always got exactly the same unwanted result. I opened the subgraph to add a few negative prompts to the NAG box e.g. "shorts", "cycling shorts" but this also made no difference.
After testing with the non-ancestral Euler sampler, this fixes the problem. If I remember correctly, ancestral samplers inject a bit of random noise into each step. My best guess is that the LTX 2.3 model has a bias towards mid-thigh clothing being a pair of shorts rather than a skirt so it is drifting towards that and away from the specifics of the starting image and/or prompt. If this is the case, there could be other instances where the prompt will be ignored in favour of the model's own biases when using an ancestral sampler.
I thought it worth mentioning in case you've seen this sort of behaviour before. Thanks again for your hard work. With improvements to the LTX ecosystem, such as your workflow, we might finally be able to reliably outperform Wan 2.2.
I forgot to mention that the audio in the final video seemed to drift less from the original low-res video when using the non-ancestral sampler, perhaps for a similar reason.
@bennyboy_77 that's a fascinating discovery! Thanks for sharing. Ultimately I have the sampler as a dropdown selection on each stage because I wanted people to experiment with what works best. For instance, for scenery or establishing shots, I often use res_2s. For extremely simple shots, I sometimes switch to straight euler or euler_ancestral since it's so fast. It's nice to be able to mix and match too-- you could technically have 4 different samplers for stage 1, and set it up so that they all use the same seed, and what you'd have then is a sampler-hunting workflow lol.
Friend, this is a GREAT workflow. You're 100% right, selecting the right seed quickly makes a HUGE difference on how many good videos LTX can spit out in any given time. You're a rock star.
Whew, crashed and burned badly. Locked up my browser and froze the system solid then took explorer.exe with it also. No idea what is going on here, but hard pass. Not working at all
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 25440MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1745 KB.
[INFO] 0 models unloaded.
[INFO] Model LTXAVTEModel_ prepared for dynamic VRAM loading. 25440MB Staged. 0 patches attached. Force pre-loaded 400 weights: 1745 KB.
[INFO] Found quantization metadata version 1
[INFO] Detected mixed precision quantization
Windows fatal exception: access violation
Stack (most recent call first):
File "E:\ComfyUI\python_embeded\Lib\site-packages\torch\nn\modules\linear.py", line 109 in init
File "E:\ComfyUI\ComfyUI\comfy\ops.py", line 452 in init
File "E:\ComfyUI\ComfyUI\custom_nodes\ComfyUI-INT8-Fast-Fork\int8_quant.py", line 1636 in init
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\model.py", line 318 in init
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\av_model.py", line 178 in init
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\av_model.py", line 586 in inittransformer_blocks
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\model.py", line 753 in init
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\model.py", line 1025 in init
File "E:\ComfyUI\ComfyUI\comfy\ldm\lightricks\av_model.py", line 427 in init
File "E:\ComfyUI\ComfyUI\comfy\model_base.py", line 164 in init
File "E:\ComfyUI\ComfyUI\comfy\model_base.py", line 1141 in init
File "E:\ComfyUI\ComfyUI\comfy\supported_models.py", line 951 in get_model
File "E:\ComfyUI\ComfyUI\comfy\sd.py", line 2011 in load_diffusion_model_state_dict
File "E:\ComfyUI\ComfyUI\comfy\sd.py", line 2024 in load_diffusion_model
File "E:\ComfyUI\ComfyUI\custom_nodes\ComfyUI-INT8-Fast-Fork\int8_unet_loader.py", line 207 in load_unet
File "E:\ComfyUI\ComfyUI\execution.py", line 298 in process_inputs
File "E:\ComfyUI\ComfyUI\execution.py", line 310 in asyncmap_node_over_list
File "E:\ComfyUI\ComfyUI\execution.py", line 336 in get_output_data
File "E:\ComfyUI\ComfyUI\execution.py", line 536 in execute
File "E:\ComfyUI\ComfyUI\execution.py", line 774 in execute_async
File "asyncio\events.py", line 89 in _run
File "asyncio\base_events.py", line 2050 in runonce
File "asyncio\base_events.py", line 683 in run_forever
File "asyncio\base_events.py", line 712 in run_until_complete
File "asyncio\runners.py", line 118 in run
File "asyncio\runners.py", line 195 in run
File "E:\ComfyUI\ComfyUI\execution.py", line 714 in execute
File "E:\ComfyUI\ComfyUI\main.py", line 327 in prompt_worker
File "threading.py", line 995 in run
File "threading.py", line 1044 in bootstrapinner
File "threading.py", line 1015 in _bootstrap
E:\ComfyUI>echo If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest. If you get a c10.dll error you need to install vc redist that you can find: https://aka.ms/vc14/vc_redist.x64.exe
If you see this and ComfyUI did not start try updating your Nvidia Drivers to the latest. If you get a c10.dll error you need to install vc redist that you can find: https://aka.ms/vc14/vc_redist.x64.exe
E:\ComfyUI>pause
From the looks of that error log, that has nothing to do with my workflow lol. That's all model loading errors.
I see that you likely didn't go into the model loader subgraph and switch to regular Unet. It was defaulted to using the INT8 model which you probably don't have. That would fix everything right there in one node link connection, just get rid of the INT8 model loader.
I updated the workflow to remove the INT8 model loader. If you try the new version, it will work just fine.
works fine but stage 3 sometimes just outputs garbage which is odd
wdym? Garbage as in static, or bad motion? If static, check to make sure the sampler it's using for stage3 is the same as stage1/stage2. I usually use euler_ancestral_cfg_pp
@foxydits yeah, everything is default. it just produces a video that looks undercooked and sometimes overcooked. it's fairly rare but odd
@crombobular it might be worth trying the updated version of the workflow, i changed some values to make it play nicer with people's vary differing setups. and just to be sure, you are using LTX 2.3 distilled 1.1 model yes? not dev or anything?
@foxydits yes, correct. the distilled model etc. i've used ltx quite a bit, which is why the random cooked video seemed so random. when i do get a broken video, rerolling just stage 3 fixes it. so at this point i'm just gonna put it down to the goblins living inside ltx 🤷♂️
really great workflow by the way! surprised no one had made something like it yet.
unrelated but useful: exposing the bypass bool on the LTXVImgToVideoInplace nodes let's you use this for t2v too which is very nice
@crombobular I shouldn't be surprised, but comfy has managed to cook up a new one with your glitch. If rerolling stage 3 fixes it then I have absolutely NO idea what could cause an intermittent issue like that. The only thing that comes to mind is if your VRAM or GPU clock speed is overclocked, it can lead to artifacting in videos when pushed to 100% usage on both. You can test this by genning 720p videos for a while, and if the issue subsides then you know it's a GPU overloading problem.
Great! It would be interesting to have a similar workflow that works with the Eros model.
I haven't used that model, is it not just a checkpoint for ltx? I would imagine converting my workflow to another ltx based model would be fairly simple
@foxydits I have tried but it is not just a simple model swap... The eros model does not come (as far as I know) in transformer distilled, so all the renders (each step) are kind of unbaked with the workflow. https://huggingface.co/TenStrip/LTX2.3-10Eros/tree/main
@vertexloves Yeah, if they're underbaked 'cause eros is a dev checkpoint, then you just need to increase steps in each stage subgraph. You can remove the manual sigmas and replace with BasicScheduler node which will let you set # of steps. I'd guess stage1 would need around 15-20 and the subsequent stages would need less. If there's a distilled LoRA you could apply to stages 2 and 3.
@foxydits Thanks I'll try that!
@foxydits The 3rd stage is taking a lot of time to generate, would you have an idea on how to cut the time a bit on that last pass ? Also, I want to add that the workflow is a game changer! Thanks for it.
https://huggingface.co/UNIKNOT/LTX-2.3-WORKFLOWS/resolve/main/10eros_seedhunt_v1.json
Here you go. I've changed a lot. And I've only tested vertical image (720x1280) to video without custom audio. The workflow still is a work in progress.
@UNIKNOT That's cool that you got it working though! I have no use for the eros model myself (it's for NSFW right?) but I hope it serves you well!
@UNIKNOT Thanks! I'm going to try it!
Fantastic stuff. My 4070 is struggling with Stage 3 (and sometimes Stage 2), throwing OOM errors. Stage 3 creates some weird artifacts sometimes, too. Other than that, I really like this workflow- particularly the bit about using pre-recorded audio. What a gamechanger.
What is your GPU and which version of the model are you using? I might be able to help with some of that. The workflow wasn't designed with low VRAM in mind, but there's certainly no shortage of easy solutions if you're running out of memory during the bigger steps. 1080p genning is quite large for under 16GB VRAM unless you're using GGUF models in the Q4-Q5-Q6 range.
@foxydits RTX 4070 (12GB VRAM), 22b-distilled-1.1_transformer_only_fp8_scaled version of the model. I should probably GGUF, shouldn't I?
@artificial_infatuation Sorry for the late reply, somehow it got lost in my notifications. Yes, GGUF is a good idea.. or lower frame resolution. There are also some LTX nodes for low VRAM you could check out. I don't have experience with them though, which is why they're not in the workflow. You could look at low VRAM workflows and patch them into mine.
@foxydits No problem! I've been using smaller resolutions and shorter clips (which works better anyway).
EDIT: I was having issues with the Video Extension feature, but I realized that we're essentially asking Comfy to remake the video- and add on to the end- in the listed duration. All good now. :)
@artificial_infatuation Best use case is to only extend from the last 17-49 frames of your previous video (on the Video Loader node, use the Skip First Frames and Frame Load Cap to select only those last frames). LTX will have all the context it needs from that window of frames. Then, as I show in my video (briefly), merge the original and the new one in a video editor. The videos have that 17-49 frame overlap, so you just find where it starts and blend the two overlaps together for lighting consistency.
@artificial_infatuation so does it mean that you managed to run workflow without any problem by using 22b-distilled-1.1_transformer_only_fp8_scaled version with 12gb ram?
How long does it take to get a video (also what are video lenght and video resolution)?
I'm having doubt trying the workflow with my 12gb vram, If you could inform me about your generation time and how tough they are, I'm thinking to try it out.
@fakolonya Yeah, it's working! I changed my dimensions to 1280x720, and it's working. It still takes about a minute for each of the four preview rolls (and that's at 5 sec length), but 2nd stage and 3rd stage are much quicker. Keep messing with it, you'll get there. It's fantastic. :)
Hey FoxyDits, quick question.
I'm trying to run your LTX workflow with SageAttention/Triton on a fresh ComfyUI install and I'm getting a Triton compile error (Failed to find Python libs, -lpython3).
Are you running this workflow on the standard ComfyUI Portable build, or are you using a non-portable/system Python installation? If portable, could you tell me roughly what ComfyUI/Python/Triton setup you're using?
Just trying to figure out whether I'm missing a dependency or if my environment differs from yours. Thanks!
I am indeed running fully updated Comfy Portable, Python ver 3.12.10, sageattention2. I believe the sageattn wheel I used was filename: sageattention-2.2.0+cu128torch2.9.0andhigher.post4-cp39-abi3-win_amd64.whl, which works for my 3090. If it's being a pain, for now you could always just disable/bypass/delete the Patch SageAttention node in the Model loaders subgraph.
@foxydits Thank you so much
Excellent workflow. Solves a problem I'd identified myself but lacked the knowledge and experience to fix. Thanks.
Thanks man, this is very useful for me. I needed options like this.
This is a cool idea and i've enjoyed learning about it. I like the idea, however i'd like higher quality generations at the cost of longer times. What would be the best way to get higher quality? Is it adjusting the manual sigmas? Which phase would have the best effects and how should I select sigmas? BTW I'm using the eros model with the distill model. I tried a suggestion of using the basic scheduler with step count of 15-20 and no distill model but it turned out really poor.
If you want to use the distilled model, the highest quality you can get is if you just gen straight to 1080p with no extra stages. If you have a really fast setup, that might be okay. However, Seed Hunting will get you a higher quality result faster because you'll be able to more quickly identify the seeds that have 10/10 motion and then build them towards 1080p. If you gen straight to 1080p, you'll be spending a lot of time just waiting for gens to finish and then only to discover that the motion isn't right.
Love your stuff already, both content and presentation. Please keep it up.
I can´t see the slider to select any of the stages. Custom nodes installed, I'm not using nodes 2.0 but I got just an empty green node. If I look for new fresh slider node I can see in the preview the settings and so on but when I import it's empty as well...
Very odd, so you have ComfyUI-mxToolkit installed, it's version 0.9.92, Nodes 2.0 is turned off, and you still don't see the slider? Even if you open a brand new workflow, add a slider node, it comes into the workflow blank? I suppose at that point I'd uninstall mxToolkit and reinstall it. Sorry, people seem to have trouble with that node (usually due to Nodes 2.0), but this is the first time I've heard about it being a problem outside of that known issue.
This tool is a game changer for us LTX/Comfy users. Thank you so much. I cannot wait to see what else you come up with.
Appreciate that. I do have a few more tricks up my sleeve to show everyone!
This looks great! T2V version pls?
Just bypass the Load Image node, and then switch all the VideoImgInPlace Nodes to "Bypass = True" for T2V. Those nodes would be in the subgraphs Settings, Stage 2, and Stage 3 I believe.
Great workflow,
Working out pretty much out of the box
Just wondering, is there any way to reduce the generation of the preview ? Like maybe reduce even more the resolution to speed up things ?
thanks !
Switch all the samplers to 'euler', if you're using my default 'euler_ancestral_cfg_pp' which is slower but better.
Great workflow, but for some reason before stage 2 upscaling happens, the stage1 gens are generated again in a random order and only when my selected gen is generated, it does the stage2 upsacling... Fixed seeds, everything updated, no "no cache" flag in startup, nothing changed between stage1 and stage2 gen... Anyone knows a solution?
I wish I did. You're the first to report it without the "no cache" issue being the cause. If you find a solution, please let me know. I don't have any ideas -- comfyUI just clearly thinks something changed and it needs to run the earlier nodes again.
I also have similar issue, for now i just set the previews to full resolution and just pick them instead of second stage pass. But would like this fixed.
@Bongosan It's a rare issue, of which the --no-cache flag is the most common cause. But another user said it was fixed when he shut down comfy and restarted it. The workflow is wired correctly, it's just something in comfyUI not properly configured or currently bugged out. Maybe the ol' turning it off and on again will work for you too.
@foxydits I did restart it a few times but no luck, i do no have cache turned off. I just bypass the stages i don't want for now, but it still costs me initial generation once. Still useful tho, so thanks.
Don't use Comfy's blue Run button to trigger the workflow; instead, click the Manual Random Seed (stage 1) button in the green box. Follow the instructions in the workflow for stage 2 & 3.
Smashing workflow, by the way. Thanks @foxydits
I'm having a little trouble, the generation completley changes and morphs the reference image into a totally new image. For example, start with an anime demon with red skin and purple hair, morphs into an anime demon with blue skin, black hair standing in a weird pose in a different background. Same prompt in other workflows doesn't do this. Any idea? All generations, new seeds seems to have this issue
That's very strange behavior. It shouldn't be able to change character colors like that without specifically prompting of course. Without seeing your settings, input image, and prompt, I can't really come up with a suggestion. If you want, email me your exported WF/input img/prompt to [email protected] and I can take a look into the mystery.
Best workflow for LTX 2.3, and most of all, best approach of LTX2.3 capabilities. Thank a lot. Maybe you may add some noise reduction node and add the LORA loader with layer-specific strengh control to help with sound without using custom sound. thanks again.
I also have the same problem as other guy, when i start second pass the whole first pass 4 previews get generated again before giving me the selected output result.
Great workflow! It's a huge time saver being able to pick and choose which sample to finish up instead of constantly restarting. One thing I can't figure is all sample 4 gens have crisp audio and after choosing the one that I prefer, 9/10 attempts always result in a slight distortion (echo/static) in the voice audio not present in the original sample. Any tweaks I can make with the workflow to fix this issue?
Hmm, I never use Gen audio so I'm not sure (all my audio is prerecorded for quality and consistency). I can take a look and see if there's a fix. I've definitely noticed that too on the rare occasion I've used Gen audio.
