[edit:
13.05.2026: Update version 4.4 (see version description).
Small fixes to get back fast generations.
Attention:
If you struggle with node conflicts or you get errors while running the workflow, please have a look at my short Trouble Shooting Guide note in the wokflow first. Most importent is to update all components sucsessfully! ]
Special thanks to:
@ArcleinSK for investigation and solving the FLF issue, as well as forcing the First-Mid-Last Frame option and last but not least for charing fantastic knowlage.
@boinobin730 for initialising, forcing and supporting this project in all kinds of matter, like providing links, running tests, sharing knowlage and inspiring diskussions.
@Urabewe for publishing the original, perfectly running 12 GB VRAM LTX-2.3 workflows mainly used here in this workflow.
Features:
Simple to use all-In-One LTX-2 workflow with options for:
Text to Video
Image to Video
First/Last Frame to Video
Fisrt/Mid/Last Frame to Video
Video to Video
Text + Audio to Video
Image + Audio to Video
First/Last Frame + Audio to Video
First/Mid/Last Frame + Audio to Video
easy switching between all options,
all steps highly automated: no manual frame or width/hight calculations necessary,
easy to set inputs by predefined sliders and aspeckt ratio inputs (no risk to set wrong frame counts or wrong width/hight values),
completely automated resizing and cropping (if necessary) of your input images/videos.
brilliant audio generation (speech/sound) with LTX-2.3.
LTX-2.3 specifications:
Workflow version v4.3 consistently follows the LTX-2.3 specifications for 16:9/9:16 aspect ratios, including automatic width/hight calculations, as well as automatic input image/video resizing/cropping.
In addition you can simply choose now any other aspect ratios according to your needs while still getting the right values calculated for width/hight and automatic image/video resize/crop.
Requirements:
GPU with 12 GB VRAM (some users reported they got it running with 8 GB too),
32 GB VRAM,
Swap file size: 64 - 128 GB.
Speed and video length:
Runs very fast: 5 second (1280 x 864) Video: < 10 minutes.
Generation of long high quality videos in one run possible: 10 - 20 seconds without any issues,
Testrun: 30 second video (1024 x 704) tooks around 40 minutes without any OOM errors. Longer videos might be possible, but not tested yet.
Important:
This workflow is intended for advanced comfyui users who know how to install and operate the system and are able to resolve basic system errors themselves, like as node conflicts, or general system issues.
About this workflow:
This workflow is mainly based on the fantastic LTX-2.3 workflows of @Urabewe.
As far as I know, those were the first workflows running LTX-2 with 12 GB VRAM. All credits goes to the original creator.
My job was only to combine and organise the different workflows in a simple to use all-in-one design.
Description
Completely redesigned and rebuilt workflow. Same functionality as last version, but:
Highly simplified design for better useabillity and improved switch logic "under the hood".
Workflow strictly follows LTX-2.3 specifications for aspect ratios now. This should give much better results (less artefacts, distortions, etc.).
Added FLF + Audio to Video option.
All options should work properly. Please let me know if you find bugs or if you have ideas to improve the workflow. And as usual: Happy generating 🙂
FAQ
Comments (138)
@Fit_Wafer9678239 Thank you for buzzing 🙂
Great work again!!! Thank you very much for v3. Is it possible to add an option not to crop and keep an 1:1 aspect ratio for example?
I would very much appreciate that too! I cannot use most of my pictures because of the heavy crop.
@grbear750611 Thank you so much.
While creating this version I dived a little bit deeper into the ltx-2.3 specifications and what I finally got was: use 16:9/9:16 aspect ratios only and images sizes divisible by 32. So one of the aims was to make sure every generation runs under these conditions only. And after my first tests I am pretty sure we will get better outputs now.
So, short answer for any other aspect ratios is: no, these limitations are by "design".
@Cosmicv As mentioned above, I strictly "designed" it to get rid of the previous issues with image quality. But according to my first tests it is no problem to use rectangel images/videos somewhere around 16:9 as long as the main part of the image is not too closeup.
@arkinson thank you very much for your prompt reply. i didn't have any problems with 1:1 ratio images in the previous versions. seemed spectacular and fast. i will stick to v2 for the 1:1 ratio. thank you very much for your effort again <3
@grbear750611 Yes, you can still use v2 if it worked well for you, cause there is no change in the generation process itself. Just keep in mind to use the latest upscale model.
I had not the time nor the resourcess to run serious side-by-side tests between workflow versions 2.0 and 3.0. So it is all quite subjectif. My personal "feeling" 🙄 after some more tests is, that v3.0 gives better video quality and better prompt following in general. But let`s wait and see, what experiences others will have.
@arkinson i will try 3 for the 9/16 and 16/9 ratios to see the quality difference. I'm in the skincare business, and so far only WAN appreciated the photos of women using skincare products. a few tries on v2 and v3 that i did, introduced strange "melanomas" on women closeup skin. but other than that other results were pretty fast and very clean
@KelevraQuakenstein Thank you so much for buzzing 😋🙂 You guys are amazing! The new version is just out for some minutes and you are all here 😂 Oh my - you shoud test it first 🙄🙂
How to make a source square picture doesn't get cut off at the edges, turning into a 16:9 format?
Thanks ! Looking forward to testing this new release ! Loved the Hukulele video, btw :-)
And indeed rendering is faster than with previous version... Thanx !
@charlesdelavigere743 Thank you. Let me know your experiances after testing, cause my own test ressources are very limited. But according to my first quick-and-dirty tests it seems, even prompt following is better now.
Btw. the Ukulele video is really too cool 😂🤣Would no wonder, if it will become the summer hit of this year 🙂
@charlesdelavigere743 Ups, overlapping comments....
There should be no differences in generation time, except comparing different resolutions. Cause did not touched the generation process in general.
3.0 is a very user-friendly workflow compared to previous versions. Thank you for adding the last frame.
@dirtysem Hi -thank you. Let me know, if you find any bugs.
@arkinson I would like to have the ability to scale along the long side, with a choice: either keeping the proportions, or in 16:9 format. Sometimes I don't need 16:9 format, but for example a square. Thanks again.
I am running the workflow, but an error message appears:“
TypeError: Compex types (LATENT/IMAGE) need to reference their width/height, e.g. a.width”
May I ask what the problem is?
@wolfcat2 Did you choosed the right options only?? Please start simple with T2V first. If this works, go the next step. If you still get errors, please provide usefull informations: Have you followed my Troubleshooting guide, wich option in use, etc.
@arkinson I chose "01 Text to Video" and it seems that the error is that I did not input the length and width of the image, but I couldn't find where to input it.
@wolfcat2 Ok. What`s about the slider nodes in your workflow? Can you see them? If not, look here.
@arkinson 按照连接找过去,恢复了slider的显示,但运行后仍然报同样的错误信息。Search according to the connection and restore the display of the slider, but still report the same error message after running.
@wolfcat2 Did you solved the node conflict by disabling mixlab nodes, like mentioned in my link???
@arkinson 我是用更改mixlab的脚本中".js"文件内容的方式显示出了slider,没注意到还有没有其他禁用mixlab节点的方式。。。。。所以仍然没有用....
I displayed the slider by changing the content of the ". js" file in the mixlab script, but I didn't notice if there were any other ways to disable the mixlab node..... So it's still useless ...
@wolfcat2 Uhh - don`t play in the scripts. Simply disable mixlab via the manager. If you need it someone, you can simply enable it agein.
@arkinson 禁用了mixlab,但还是那个报错没变。具体定位似乎在subgraph的视频帧数那个节点,真是没办法。
Mixlab has been disabled, but the error message remains unchanged. The specific positioning seems to be at the video frame rate node of the subgraph, it's really impossible.
@wolfcat2 Sorry, I can`t help you with your system. Maybe the easiest way for you is a fresh Comfyui-Easy-Install installation, just for video generation (see my short guide at my Wan model for help). It will take you around 30 minutes and you are up and running.
@arkinson 我所有的工作流都在这套comfyui中包括wan/zimage/qnwen,我也是懒了,有空再用另外一套comfyui试试,感谢你。
All of my workflows are included in this Comfyui, including wan/zimage/qnwen. I am also lazy and will try another Comfyui when I have time. but Thank you.
@wolfcat2 As descriped in my short guide you can install it parallel. Believe me, it is the most lazy way of all 😉
I've tried to find the extensions from the manager but cant find these 2 missing ones:
INSTALL REQUIRED OrchestratorNodeMuter
INSTALL REQUIRED LayerUtility: AnyRerouter
in subgraph 'New Subgraph'
I think the node names are:
comfyui_custom_switch (1)
comfyui_layerstyle (17)
Do i need some github repo that i need to clone?
EDIT:
https://github.com/tritant/ComfyUI_Custom_Switch
https://github.com/chflame163/ComfyUI_LayerStyle
By using "git clone" command those 2 worked. The same. Couldnt find the custom switch from the comfyui manager and the layerStyle didn't work.
I'm using ComyUI-Desktop version
@hid91herring074395 Seems you have node conflicts in our comfyui system. Anything updated according to my Troubleshooting Guide? No errors during updates??
Generally all nodes I use in my workflows are installable via the Manager. In the Manager search for: "Custom Switch" to get the Orchestrator nodes (or see github).
"LayerUtility: AnyRerouter" is strange, cause I did not use any LayerUtility (as far as I remember).
As I often mentioned here: take into account to install a seperate comfyui for video generation only with Comfyui-Easy-Install (see my Wan workflow for a short guide). It takes around 30 minutes and 2 mouse-clicks and you are up and running.
@arkinson No errors during any updates. GPU drivers are up to date. ComfyUI v0.18.3
Didn't test the separate ComfyUI or to delete all custom nodes.
I guess i missed that troubleshoot note :D
Anyways, it works now so all good. By the way, nice workflow. This pleases the eye that most of the working components are under subgraph.
@hid91herring074395 Sorry, I don`t got you. What have you done to solve the issue??
@arkinson He manually installed the custom nodes. I also had to manually install the LayerUtility node because installing it with the manager, the error of missing node will not go away.
Even stranger, I had Layer Style installed before,but I got the error anyway. Had to delete the folder and git clone, then it worked
@ntrtales Yes, this is what I did.
@Gavr728 this worked for me
No option for custom resolution? outside from "resize longer size". I could really use it for I2v
@Gavr728 To use I2v/V2V is no problem as long as your input is rectangle and your subject/object is mainly centered. It just gets automatically croped. For the aspect ratio discussion have a look here.
@arkinson Okay, but if you have a square image that needs to be cropped vertically, but your tool is cropping it horizontally, it can be inconvenient.
@dirtysem But that's getting a bit pedantic 😂🙂 No, serious - I got you. I really did not tested yet, what the crop node will do with a square image 😉 Easiest way to be sure, would be to crop the image manually in advance - also for ractangle images, where the main part is not centered.
@arkinson I mean that I want to (and have means to) generate at 2k resolution
Actually, I figured out how to generate at 2k. I just conected a node with my custom "longer edge" to Set_m i2v longer edge. But I have some issue, the first few frames at a lower resolution, so the video has some "upscale" effect as it progresses. This is the first workflow where I noticed it. In others its not so noticeable, even with the same model and loras. I set the image compression to 0 and detailer lora to 0.5. What am I missing? @arkinson
@Gavr728 You simply can edit the slider nodes (right click -> properties). Set Resize by longer edge = 2048 should work. It will generate 2048 x 1152 automatically. But you should manually edit the values for the LTXVPreprocess too - from 1536 to 2024 (open subgraph, see top centre). Let me know if it works.
@arkinson Yes, that seems better now. thanks
@Gavr728 What gpu/vram/ram do you use? Do you still use the gguf models? I have no experiance with higher hardware. LTX-2.3 will work up to 4k. If 2k at least delivers better quality (and I`m pretty sure it should) I would implement the settings for the higher resolutions in the next update.
@arkinson I have 3090 and 16 gb of ram. Using UD Q5_k_s t2v, 6 sec of video in 2560x1408 takes 10 min. Yes quality is better. In wan2gp I was generating in 4k with q4, but it uses some "light" models which don't directly work in comfy
@Gavr728 If you can't afford a graphics card, it's better to add more RAM. I have an LTX2.3 (ltx-2.3-22b-dev.safetensors 45 GB) , and it works perfectly with my old 3060 graphics card with 12 GB of RAM. I have 128 GB of RAM, and it renders perfectly in this 1536P workflow.
@Gavr728 Ah, thank you. RTX 3090 16gb vram is a common league, as I see here from the comments.
One more question: did you compared your resolution 2560x1408 with the standard 16:9 2048 x 1152 resolution? Cause there were a lot of image quality issues with my previous workflows, were I "allowed" any aspect ratios.
@arkinson Yes, I tried 1536 and full hd. 2k have some problems with prompt following from what I saw. Both 1536 and full hd generated what I wanted in the prompt from the first try. With 2k it was more complicated, it was generating some "water riple" effect on the image instead of a coherent scene, but I figured out it was because it was interpreting the prompt "The scene is alive and 'breathing' through micro-movements" a little bit too literally
@Gavr728 Thank you. But I`m a little bit confused now, If I understand it right:
?? 1536 x 864 = 16:9 (highest resolution I set in the workflow for 12 gb vram),
Full HD = 1920 x 1080 = 16:9,
2k = 2048 x 1152 = 16:9
These resolutions should work without any issues, cause they are all 16:9 and divisible by 32.
But in your previous comment you wrote: "2560 x 1408 takes 10 min". My question was more about this deviating aspect ratio and if you saw any quality differences to the standard 16:9 resolutions.
@arkinson Yes, I set 2560 as a longer side, expecting that it would be 2560x1440, but I got 2560 x 1408.
@Gavr728 Uhh - slowly: Wich option did you used?
You edited the input slider to be able to set 2560 as longer edge - right? And you got out 2560 x 1408??? That`s strange, cause 2560 is just multiplied with 0.5625 (= 1440) as you can see in the subgraph.
So, please let me know the option you used and I will check if there is something else wrong in the workflow.
@arkinson I2V
By the way, if you replace the VAE (Tiled) node with LTXV Spatio Temporal Tiled VAE Decode, there will be more stability
2-8-48-8-false-auto-auto
@Cybernix Hi, thank you for the hint. That`s interesting. Do you have practical experience with it? Is there a github page or anything else for more information? I just had a quick look, but it is hard to find anything related.
@arkinson
I changed the VAE in your workflow, it's easier to do this
https://drive.google.com/file/d/11c9D1FY1hmnJIioRA3jVp_cyI_J2cS1g/view?usp=drive_link
It is part of ComfyUI-LTXVideo
@Cybernix Thank you. Sorry, if my question was not really understandable.
I had allready seen the node and I got your settings and I allready had a look at the LTXVideo github page. Maybe I`m blind, but I found no serious information about the "LTXV Spatio Temporal Tiled VAE Decode" node at all. So my question is, from where do you know "there will be more stability"? Ore simply asked: did you run own tests? Ore did you found some information somewhere?
@arkinson I often had crashes if I assigned a lot of tasks. This was especially true on VAE. Then I changed the node and the crashes went away, and almost all the tasks completed, so I checked it myself. Just try it
@Cybernix Thank you so much 👍 That`s a good point. I noticed RAM crashes at VAE decode in the past sometimes too. I am not really sure, but I would say this was all with older comfyui versions. During all my tests for workflow version v3.0 I got only OOM crasehes during the second ksampler pass (the last 3 steps) when I set highest resolution and too long clip length (>14/15 seconds).
Ok, another point coud be: my swap file size is set at 64 - 128 GB wich is disproportionately high. Maybe your suggested node + settings needs less RAM. This would be an advantage of course. I will try this. Did you eventually noticed different RAM/swap file usage?
@arkinson To be honest, I haven't been monitoring the swap. I have 18 GB allocated, and it doesn't seem to be the problem.
And I didn't have time to buy a 32Gib for my RAM before the hysteria set in. So I'm forced to use the 32 :D
I just checked. 27 GB of RAM and 57% swap usage. Fedora Linux.
@Cybernix Ah ok, a few Linux users reported simillar issues, as I remember right. Yes I see this only from my side of a Windows machine 🙄 Best way for a quick side-by-side test would be a simple workflow: Load Image -> VAE encode -> VAE decode (node 1 or node 2) -> Preview Image. When I find some time I will have a look at it.
required nodes were hard to find
In my workflows I allways use common nodes wich are installable via the Manager. To get it running, you have to solve the node conflicts in your comfyui system first - see my short Troubleshooting Guide for help.
I'm facing a problem, and I'd like to know if it's possible to separate the audio from the video and render them separately. If the audio meets my expectations, I can continue working with the video renderer. However, the audio often doesn't meet my expectations, and constantly rendering the video is time-consuming and inconvenient. Therefore, I'm asking professionals if this solution is possible.
There are lots of audio engines in comfyui and even cloud services. So of course, you can generate your desired audio first. Then you simply use my Audio input option and you might get whatever you want. But belive me or not - to create professional audio, you need knowlage, capable hardware and lot of time 😉
Amazing Workflow.
Your V3 actually is so damn good!!
I have NEVER found another workflow that actually successfully does Text to Video. Well done.
Just one thing. with image to video, it doesnt take the aspect ratio of the image, so it crops it, is there any way around this?
Although, now testing it, I kinda like it xD
But just as a possible option
great work! I only have a problem i don't know why. the 3.0 ignores the last frame for some reason. 2.0 worked perfectly from the beginning. ,maybe i'm missing something but i don't know what
@AIden_AIzawa Hi - thank you. I assume you mean FLF to Video? This should work out of the box. Ok - you have enabled "Image to Video" + "Last Frame input". Please set "Cut Off End Frames" = 0 for testing (and have a look at my Input Help note).
@arkinson yes the nodes are enabled and i tested also with setting 0 to cut off end frames. i'm testing and sometimes it ignores the first frame and create the video based on the second...
@AIden_AIzawa Mmh, that`s strange. I did not run much tests for myself with FLF and mostly used simillar images or the same image for first and last frame. With very different images I got some problems too (random jumping between firt/last frame etc.).
I just checked the template workflow and they use other LTXVAddGuide nodes (2 instead of one) and they use strength of 0.7. Also they use only a one pass image generation. I will see, if I can implement the template nodes - but this will take some time.
As a quick test, please just edit strength_1 and strength_2 in the LTXVAddGuide to 0.7 (open subgraph and go to the bottom left). Let me know your results.
@arkinson thanks! I'll test it
@AIden_AIzawa I tested the template FLF2V workflow yesterday (just changed the models to the one we use here, so I can run it on my machine). Unfortunately I got simillar issues (abruptly jumping from one frame to the other, no lipsyncing, jumping forth and back, etc.. So I would say it is no node or workflow issue. Maybe the full models will work better with FLF, but I can`t test it.
@AIden_AIzawa I just read your first comment again. You wrote v2.0 worked perfectly for you. If I remember right, even with v2.0 I had simillar issues. Sometimes I believe it depends on a variety of conditions, like used images, prompt, aspect ratios, seed, etc....
I was having a similar issue. When I did first frame and last frame, the preview looked like a garbled mass of limbs and shapes and I was horrified to see what that might have rendered to, so I stopped it and popped into the processing subgraph. I switched the "frame_idx_2" value to -1 on the "LTXVAddGuideMulti" node and it seems to have fixed it. It looks like it may have both been set to 0 by default, so it was loading the two frames into the same spot and trying to blend them together.
@ArcleinSK Wow - thank you so much for your hint 👍 and I believe you are right 🙂 I just checked the template workflow again - as you said: first frame = frame_idx_1 = 0 and last frame = frame_idx_2 = -1. This makes sense. I did not worried about the values before 🙄
The only thing that surprises me is that I haven’t achieved satisfactory results with the template workflow either (see above).
I will test this in my workflow now. So let`see.
@ArcleinSK I did a first quick test and it seems to work now. Pictures are in the right order and I did not got any artefacts. Just a quick flip between first and last frame image - but I used slightly different images, like different clothes. Please let me know, if you get good results now or if you see any other issues.
@arkinson Yeah, I've seen that odd flip happen near the end of the generation process but right now the trimming of the last 24 frames seems to be a reasonable workaround. Out of all the LTX workflows, yours has definitely been one of my favorites. There's one by MrXin that is also a good one - but yours tends to render a bit faster on my 3060; sometimes my generation hangs for literally thousands of seconds on MrXin's workflow but yours just keeps on chugging at the meager pace my hardware allows. Lol.
I'm still a bit of a noob but I've been learning a lot through your workflow, so I definitely appreciate it.
@ArcleinSK Thank you so much for your feedback 😋
I ran my tests yesterday without cutting the last frames and as mentioned I got good "quality" till to the last frame. The strange thing is just the sudden flipp between the first and the last frame image somewhere in the final third of the video - instead of a smooth transition. I did not run much tests with FLF at all, but I would say I saw this behaviour in all workflow versions. One point coud be to play with strength_1/2 too (I only used 1.0 yet). I will give this a try, if I find some "free" computing time 😅
Workflow speed/stabillity: As you can see in the subgraph, I mostly used the template workflow and the models wich urabewe provided originally for 12 gb vram, so there is nothing special, except the more complex option switching and the calculations for width and hight. I just did a lot of test runs to keep the selectable resolutions in a range wich works well with 12 gb vram (mostly 😉).
@ArcleinSK I ran some quick tests with lower strength (down to 0.4) allready, but it did not change much and the issue with the sudden flipping still consists.
@arkinson I suspect it may be caused by how the node indexes the frames. -1 is like a shortcut/placeholder for the final frame if it doesn't know how many frames you are going to give it and how it counts through the frames at the end.
Imagine 121 frames for example laid out like a film strip. After it renders starting at the first frame (0) and gets to frame 120, I think it says somewhere in the node "ok, we've got all our frames rendered, now we need to put in the last frame, but it's at position -1. So lets run back through our list of photos and look for the -1 frame and move it to the end." So it goes all the way from 120, passes position 0 along the way, finds -1, then moves it all the way back to 121.
And in that travel time it shows the first frame on the way to position -1 when it passes it, shows the last frame when it gets to position -1, then traveling back to 121, it passes the 0 position again and shows the first frame again and gets to the end to try to insert and show the last frame. That's how the "image flip" feels to me at least.
It was a lot more noticeable when I customized your workflow to add a middle frame to the mix to create the video below. Lol. Instead of seeing just the two images, I saw all three images do the weird flip.
https://civitai.com/images/126256044
To add a middle frame, you only need to bump the node to three images and do some math [ round(total_frames/2) ] and use the result for the middle frame index. But you typically want the middle frame to be a lower strength so it doesn't try to diffuse into that image all the way and make it look frozen mid-scene. The strength value is like the denoise on a ksampler. The higher it is, the less "noise" it can play around with to fill in the blanks of the video - I learned all about it in the 3 or 4 hours and 20 something iterations it took to make those 29 seconds. 😅
When I get time to fool around with it again I'm going to try some other index numbers for the last frame. Like using total_frames - 6 or something with a 0.85 or 0.90 strength
@ArcleinSK I just saw your linked video. Amazing! Apart from it keeps me laughing 😂 it is unbelievable that you got the whole scene in one run.
Oh, oh - this stuff is more complicated, then I suspected. To be honest, I just copied parts of the workflows, without understanding what specific nodes are really doing....
Your investigations are very interesting and gave me the motivation to try to dive in here a little bit deeper.
Frame index "-1": To understand it as a placeholder value makes sense (I was allways wondering about it before). But I did not really got your explanations how you think it works.
In my opinion (or more what I actually guess) and your idea of the placeholder I think it should work like this:
frame index = 0 inserts the given image to the latent image queue at position 0 (= first image) and
frame index = -1 inserts the given image at the last position (with your example of total frames = 121 -> it means position 120 = last frame). Everything should be fine - so far 🙂
I do not understand the video generation in detail, but simply spoken it should work this way: the generation process allways interpolates across several frames (in the future) to generate smoth movements following the given prompt. So it should start exactly with the first frame and should narrow to the last frame at the end - frame by frame. But maybe I am completely wrong here, cause it should not give a "hard" flip.
At first quick "researches" I found only some simple descriptions about the LTXVAddGuide (Multi) nodes here for example and something interesting too here wich gives a hint about the LTXVCropGuides nodes.
I did not know that it is even possible to insert videos instead of single images (described in the first link above). It seems the LTXVAddGuide (Multi) nodes have much more capabilities then simple FLF or multi frame generation.
I will try to create a most simplyfied workflow for faster testing the basics of this node.
If you get more informations meanwhile, please let me know.
thanks guys, now it works changing the value in ltxaddGuideMulti!
@ArcleinSK If you like to get completely confused download my testworkflow from here: https://civitai.com/articles/23042 (choose LTXVAddGuide.json from the Attechments). It makes simply visible what the LTXVAddGuide nodes do.
Just enter a short video (it gets automatically croped to 25 frames) and a first- and last frame image. The workflow vae encodes the video to latent images -> runs the LTXAddGuide node -> vae decodes back to images. Finally you can compare the original images with the processed ones.
The node for i2v works well. But what the LTXVAddGuide node do is completely strange:
It sets the first frame image at position 0 (that`s ok), but then it sets allways 8 first frame and 8 last frame images at the end 🙄 (I ran tests with all available nodes).
You can even test it with the LTXVCropGuide node. Most strange thing is, it does not matter, wich values you set for frame_idx and strength.....
@ArcleinSK Just found this github documentation wich explains the frame_idx values.
@ArcleinSK Thank you so much for your comments at the article page here.
Just a quick reply for today: I`m pretty sure you hit the nail at the head 🙂 I allways wondered why the latent output of the LTXVCropGuides was not connected. Yes, with loosing the necessary information this will generating some nonsens for sure 😅.
I will have a look at your workflow soon as possible and trying to prepare a new update to fix this stuff. Your example video looks very well 👍 smooth at it should 🙂
Thanks to this amazing community we get better and better 🙂
@ArcleinSK ltx-2.3 combi 3.0 - Arclein edit: Very cool! Thank you 🙂 I will test this now.
What is your opinion - is there a greater demand for 3 frames (FMLF) for a couple of users? Cause switching between FLF and FMLF will need some more logic (at least 2 seperate LTXVAddGuideMulti nodes). For myself I was mostly too lazy to create just 2 frames (FLF) 🙄
Interpolator + Upscaler: I will test this soon. If this runs really fast, this would be a great job. I tested the "normal" image upscaler and the Rife framerate multiplyer in the beginning, like I used it for my Wan workflows, but this was much too slow with the higher LTX resolutions.
Qwen-Rapid-AIO: Yes, I allready heard from boinobin730 about a Qwen workflow to generate the frames. I really have to test it now 🙂Looks all like some sleepless nights 😅
I'll be back as soon as I've got a bit deeper into all this stuff here....
@arkinson I think there are some edge cases where a middle frame can be useful. For my Woman in Red video, with just the first and last frame, the camera kept wanting to swing out around the table and pass behind a person or people that were sitting across from her. Generating a middle frame (with Qwen's powerful edit abilities) at the middle of the table helped me force the camera to the spot I wanted it to be. I ultimately had to tweak the prompt to make the woman "speaking to a fixed point across from her" and the man speaking at the end "the viewer" because even then LTX wasn't cooperating. If you imply a second person in the scene ("man off frame", "man behind the camera") - the model will do everything in its power to put that man in the scene whether you want it to or not.
In short, for most quick scenes, a middle frame may not be necessary but it can help drive camera movements in a specific way if you want it to be at a certain point by the middle of the scene.
@ArcleinSK Thank you for your explanations. I think you are right.
Meanwhile I have the new workflow running. I had some ideas over the last days - so did a complete redesign of the "aspect ratio AND width/hight divisible by 32" calculations. Fortunately I found a pretty cool node wich combines this more "complex" inputs and calulations in just one node. This allowed me to significantly simplify the input and preliminary calculations in the subgraph, and also to add the option for <any aspect ratio> with width/hight divisible by 32 too. With these changes creating the swich logic for both options (FLF and FMLF) was much easier as expected.
I will run a few more tests and hopfully publish it soon. First quick pre tests ran without errors so far 🙂
@ArcleinSK Btw. I tested your Qwen workflow and started with "all kinds" of Qwen experiments. Amazing stuff. I tested Qwen a couple of month ago but gave up, as it didn't run properly with 12 GB of VRAM. But now it runs like a charme. I really did not know about the pretty things possible with Qwen. Thanks again.
@arkinson For sure. Qwen is really powerful. I used to do a lot of digital photo manipulation in Photoshop years ago, but Qwen's edit image functionality does everything in seconds what used to take hours for me in Photoshop.
@ArcleinSK Oh my - Photoshop 🙄 the good old times 😅
It is really unbelievable how well and easy Qwen can edid images - and I just did some first quick tests.
Do you know any Lora trainer for Qwen wich will work with 12 gb vram? During my last search some month ago there was nothing available for low hardware.
Btw. ltx workflow v4.0 is out now 🙂 hopfully bugfree.
@AIden_AIzawa Thank you so much for buzzing 😋🙂
Hello - Does this have frame interpolation? How can I increase the upscale of the final output?
Thank you so much! Having pretty good generations so far.
@bobbobster Hi - and thank you 🙂 Short anwer: No.
Creating the previeous LTX workflows I tested both: frame interpolation and final upscaling, like I do in my Wan workflows. But with LTX it makes no sense. The slight improvement in image quality is out of all proportion to the significantly longer processing time. LTX-2.3 is made for 2k and 4k generation. Unfortunatelly with 12 gb vram we have to use much lower resolutions. So, if you have better hardware, first increase resolution and framrate and use beter models. This will increase quality significantly.
@arkinson Great, thank you for the detailed response! I understand, and will take your word for it. I'm just glad to have a working workflow. Thanks again
@bobbobster I forgot - even with low hardware, with ltx try allways to generate in the highest resolution possible. Happy generating 🙂
V3 how do you edit total frames?
@VT95 Total frames are automatically calculated by the clip length you set via the slider node. And just in case you can not see or set any value in the sliders, please look her in the last comments. This is a known issue/node conflict.
Pretty good work! Congrats for your work Sir.
May I ask? Maybe someone else have the same problem than me. When I generate a video, at the end of it, in the center of the video are visible some kind of artifacts.
You can either shorten the video length, or another approach is to set a final image (using the same image as the "starting" image often resolves this issue, depending on the way and what you prompt !
@zorojim Hi - thank you. Wich option do you use?
With workflow v3.0 and the strict 16:9/9:16 aspect ratios I have not seen much artefacts anymore for myself, even with clip length up to 14 seconds.
Only FLF to Video has still some more serious problems, that’s why I have added the option here for automatic cut off.
I don't know what i am doing wrong, it seems to work however all generations is blurry. Any help for a newbie in ComfyUI
@WigglyDad148 Do the basics: check twice you selected the right models. Destilled Lora is active and wight is set min. 0.5???
@arkinson Started fresh from a brand new ComfyUI instance on Unraid Nvidia, downloaded all missing workflows, all models loaded in the correct folders and Lora are set to tried both 0.5 and 0.6.
Also tried negative prompt for blurry etc, all tests are T2V with default settings.
@WigglyDad148 Some videos posted have the workflow included, did you try to used one of them (just drop it in your comfyUI environment) to see if something is set differently from you which might explain why you observe this blurry effect ? Are the checkpoints you use the correct ones ? I had this kind of issue long ago when I used the wrong checkpoint type with a given workflow.
@WigglyDad148 Thank you. T2V works out of the box, no need for a negative prompt.
Ok. I assume you have cleared all node conflicts on your system and all components are updated (see my Troubleshooting guide).
Just to be sure, we are talking about workflow version v3.0 and you use my workflow without any modifications. That being said your blurry outputs are most likely a model mismatch. You have to check every blue loader node, if you have really selected YOUR local model. Please understand: downloading the right models to the right folder is one part, but you have to select these models in the loader nodes too of course.
@arkinson I ended up re-downloading all the models, select each and every one of them in the workflow. And that seem to fix the issue.
Thanks again for your work and your help with this.
@WigglyDad148 Yes, a lot of user forgot to "select" the models. I`m glad you got it running. Happy generating 🙂
Great workflow!
@sexgod1979 Thanky you mate 😋
@arkinson great to see it on v3.0 with all the different workflow types combined. Saves having to have 5 different workflows for one model
@sexgod1979 Yes, that was the intention, cause even with v2.0 it was a lot of stupid work to maintaine 5 or more nearly identical subgraphs just for some different inputs.
On the other hand, comfyui is very limited in "programming" options - no true runtime logic, no real subroutines to handle more complex workflows. Lots of buggy or very limited swich nodes (I tested nearly everything available).
Subgraphs are a good way to organize workflows visually, but they are not comparable with subroutines wich organize processing too. And subgraphs nested in subgraphs getting buggy as hell.... So I had to "redesign" my ideas several times.
And the most worse thing: at some point of complexity, with every node you add - error debugging in comfyui gets exponentiallly more painfull 🙄
But finally I learned a lot about comfyui itself and I believe the handling of the workflow is mostly usefull and easy as possible. Even if comfyui "newbies" have a lot of problems to get the nodes running and there was a "rebellion" against the new aspect ratio restrictions in the beginning 😂
Btw. I just started to test some of your ltx Loras. First quick results looking amazing 🙂
Hi there, thanks! Question, Img + Audio to Video.... How does this work, ie, any audio I have plus text prompt and initial image goes to video? What if the text prompt conflicts with the audio? Or is the audio the only prompting? Can the video be greatly or reliably be controlled with good audio? And is this good enough for lip syncing?
@Delavestra "any audio I have plus text prompt and initial image goes to video?" Short answer: yes.
Just a stupid example: You have 10 seconds of a pretty cool rock song with clear understandable female vocal and you have a nice image of a woman playing a guitar. Choose 10 second clip length and create a prompt like this: "The woman in the blue dress sitting in front of the camera plays the guitar and and passionately sings a rock song while the light dims and the camera zoomes in". With some luck this should generate a lip-synct video of the woman singing the song and playing the guitar.
And of course: Your audio + your image + your prompt (discribing what you want to see) should not conflict each other - unless you want to carry out some completely crazy experiments 🤣
If I have a 5060 ti 16 gb, what should I change when using this? X
@Setraether Run highest resolutions possible first, to get much better quality (edit the slider nodes, but keep in mind: aspect ratios of 16:9 AND width/hight have devisible by 32).
Use higer models like Q6 or Q8 and/or try higher frame rates.
@arkinson Thank you so much, I appreciate it
@arkinson I've tried this out. For some reason I've always had inconsistent speed results with video gen, but ltx proved quite decent in another workflow. I find the results here are too slow for me. I compared both wf's and noticed to things. The other wf uses:
- Distilled version of the model instead of dev;
- Clip (gemma) is not fp4 but q4_0 GGUF.
In your wf I couldn't switch the clip/gemma to test/compare, but could this possibly be the cause of slow generation? My first image2video took 1500s; clip lengt 10s; fr24; and I2V/FLF Resize Longer Edge [[P:02 Image to Video]] on 1024 <- this is resolution I suppose.
My 2nd generation I put resolution to the max 1536 and generation took me 2507s.
Lots of questions.. maybe some good insights :)
EDIT: Significant speed boost using distilled instead of dev.
@Setraether Uhh - with higher resolutions I meant at least 2k -> 2048 x 1080 of course.
Sorry, but without a link to your other workflow it is not possible to compare anything.
@arkinson No worries, and maybe I'm not understanding how to change the resolution aas the only slider I see is under a node named: I2V/FLF Resize Longer Edge [[P:02 Image to Video]]
And it has a max of 1536.
@Setraether Btw. the workflow has 7 sliders at all.
But I guess the problem is more, you asked a question for very skilled comfyui users at a pretty advanced level, but it seems you are struggling with the comfyui basics in practice.
Let me know if I`m wrong, otherwise start simple as possible with t2v and try to understand step by step what is possible and what not.
Even this workflow is made for advanced users, a lot of comfyui "beginners" trying to get it running - and with the right questions you mostly will get some helpfull hints here.
Any Chance you have this with ID-Lora as well for voice auidio reference? Going to try and hook it up myself...
@Delavestra Interesting. I had a quick view at the simplest ID-Lora workflow. But why integrating it here? The whole workflow are just a handfull of special ID-Lora nodes. So this is a complete different "architecture".
Just use it at it is. But I guess the main problem would be to get it running with 12 gb vram.
@arkinson I was able to get the node hooked up! It's actually quite simple. But not sure if 12gb would/could handle that.
I know that the recommended aspect ratios lead to better generation results, but could you add an option to unlock other ratios too? It can be a little inconvenient sometimes.
Yes, there was a "rebellion" against the restricted aspect ratios after publishing version v3.0 😂 and of course I know, other aspect ratios will work too generally. The problem is to arrange this in a "user friendly" "interface".
The simplest way would be to offer sliders for width and hight (restricted to divisibility by 32 only). This would also reduce the amount of calulations in the subgraph substantially. But the con is: You always have to "measure" your input images/videos manually and to set manually two input values near your measured values wich are hard to remember each time - not to mention you want to have a certain aspect ratio too (like 16:9 or 4:3 or what ever you want), cause keep in mind: your manual values have to represent your aspect ratio AND divisibillity by 32. You really should try this "simple" task with your pocket computer for some aspect ratios 😉
So in my opinion, the most comfortable way is at it is: just input the longer edge and the algorithm takes care of everything else in the background.
If you have a good idea to solve this, please let me know.
Well, the former version worked well for automatically readjusting the resolution. I hope the next version is going to feature the same function. Also 1280px for the longer side worked well and it was balanced for 12GB.
@haymaker No, with any earlier versions, like v2.0, there were a lot of reports of issues with artefacts, bad prompt following, bad image qualitty, etc. So I definitely will not go back without a solution of the above mentioned math.
Great workflow. However, can you please add an option to bypass the upscaler and generate in only one pass with the desired resolution. Thanks.
@biodizain Hi, sorry - this is actually not on my priority list, cause I allways try to get out much quality as possible and as I can see from the comments, that seems to be the common use case.
To implement a real one pass option just to reduce generation time for maybe 20 - 30 % and probably much lower video quality would be a lot of work and would completely run against the actual used swich logic. For myself I just cancel a generation, if I see it goes in the wrong direction.
So, short anser is: No. Btw. New version v4.0 is out 🙂
This one actually works! 3060 w/ 12GB. have to keep the length to less then 5-6 sec to avoid oom. A request is to not produce audio file(?) when chose image2video only. I think can reduce some vram usage.
@raylar529 OOM errors: ram or vram??? I guess ram. Increase your swap file as specified. You should run up to 14 seconds with high resolution (longer edge = 1536) without issues. Btw. use my new version v4.0 now.
Your RAM size also plays a big role. I can run practically all LTX2 workflows on my 3060 12GB VRAM, even the default ones, but I have 128GB RAM and sometimes it gets used up to 75-85%.