My ComfyUI workflows for using LTX 2.3
This workflows are used by me to create my art.
They are optimized created of my latest knowledge to enhance the outcome.
"If this workflow leveled up your day, I'd purr-eciate a like! 😻"
Versions & Information👇👇👇👇👇👇👇👇
👉 Please read below and the file descriptions "About this version" for more info's.
⚠️ Do not use the workflows with the "Nodes 2.0 beta" from ComfyUi or it will mess up things.
👇👇👇👇👇👇👇👇
What you get from the comfy workflows:
♨️ Easy controls
✅ As less as possible dependencies
🪧 Detailed documentation
⛓️ Highly automatic logic
✨ Optimized results
🔖 Bookmark-Shortcuts with number keys
Types of workflows
OmniForge C-LTX23
🎥 I2V, FLF2V, T2V,
V2V+ Audio🤝 Video resolution matching - Fully automatic scaling
⚙️ Chunking (Seconds/Frame based)
🎞️ LTX Latent Upscaler (2x native latent)
🪫 Low VRAM optimizations
✨ Multiple Resolution Upscalers
🔗 VAE options
Full VAE or Tiled VAE
🔊 Multiple Audio options
🫥 Watermark option
📢 Soundmark option
🧮 Color transfer feature
🔖 Bookmark-Shortcuts - with number keys

🩻 Known issues and advice's
⚠️ Some workflows may set on webp av1 encoding (VHS node) - If your computer/setup missing drivers use any other like H265 or H264!
Install ffmpeg!
Update Comfyui and custom_nodes!
Update pytorch 2.9+cu128 or higher
Make sure to read where files/models should be placed inside the workflow
Check if the filepath for model/clip/vae match your system like Linux/Windows
Some Custom-Node-Packs need manual installation (e.g. RTX Node)
The plugin ComfyUI-DD-Translation can break node connection (avoid)
All older Versions are available inside my GitHub Repo.
Learned a lot from RuneXX LTX Workflows! They are awesome! Check them out! 🫶
Special thanks to @DustyDrab for helping with tests and bug-fixing!
YOU are responsible for outputs as always! If you make ToS violating content and I get aware I WILL report this.
Description
Latest Changes / Fixes
Overhaul and more DaSiWa Nodes
LTX Director . Removed V2V for now
Added Last-frame Extraction
FAQ
Comments (138)
Amazing. I will defo try your workflow. What is your opinion about LTX to compare with WAN for NSFW content. Worth to switch now or better wait?
Isn't it better to use gemma_3_12B_it_heretic-v2_fp8_e4m3fn instead? Or does it not make a difference?
this is odd. can't get it to work, I have selected the models yet the node Settings and Backend is still red.
It might be related to the audio? I want the audio to be generated from text not an audio file.
Oh, would it be possible to add support for gguf? :' I really love the workflow
I've tried to make this v1.0 setup work, but there are some errors that keeps the process from reaching the video generation phase.
Though before even that, there was some issues with the video input node for V2V. From what I can see, even if you don't want to go with the V2V option, the process insists that there has to be at least something provided for the node before it moves on with anything else. And then I had an issue with taeltx2_3, both in terms of that ComfyUI wasn't liking how it was "LTX/taeltx2_3.safetensors" and not "LTX\taeltx2_3.safetensors" and how it took a while to find the node where it was linked to change this.
But after handling those issues, there is one error where I can't really figure out the cause at a glance. No node seems to flare up red, I'm only given the following error message:
got prompt Failed to validate prompt for output 2196: * (prompt): - Required input is missing: images * VHS_VideoCombine 2196: - Required input is missing: images Output will be ignored Failed to validate prompt for output 2182: Output will be ignoredAt the very least I've figured out that it has something to do with the images output from Settings and Backend that is passed onto Video Combine, but I have no idea what within Settings and Backend causes the process to treat it as something other than images.
love it
Love your workflows. This one left me with a question though. Why are you using "Patch Sage Attention KJ" instead of "LTX2 MEM EFF Sage Attention Patch"? If anyone using a RTX 30** card has issues using it, there is a replacement .py that can be found on github. I have a 3080ti 12gb vram card and haven't had issues thus far using the original py file/node. Anyway, glad to see you on ltx now.
Oh boy... I dunno how I feel about LTX. I keep seeing videos from other people that look amazing, even on this very same page, and then I look at mine and suddenly the subject belongs in the next Silent Hill game or something.
The anatomy is all over the place, skin colors keep randomly changing, the camera refuses to sit still for more than two seconds unless I specifically force it to, and FLF2V barely follows the original image half the time. Sometimes it barely even follows the prompt either.
And somehow 2x FPS Latent Switch makes the whole thing speed up like everyone suddenly drank five energy drinks before the scene started. Sage Attention also seems to make the quality way worse on my end, which definitely is not helping. I even followed the sizes divisible by 64 rule for LTX, but no difference.
And the weird part is, I kept things simple. The pose was simple, the image was simple, everything should have been in the right place, and somehow it still came out looking like complete nightmare fuel.
The strangest thing is, I grabbed other people's pictures and copied their prompts just to troubleshoot things and see if I could get the exact same result as them... and I did. It actually looked really clean too, especially since I pushed the resolution even higher than what they were using.
But then the second I switch back to using my own pictures, everything immediately starts going off the rails again. The anatomy gets weird, the motion gets strange, random parts start melting together, and the whole thing starts looking cursed for no reason.
At this point I am starting to think LTX just has some kind of personal grudge against my images specifically. I just don't understand.
I will give LTX one thing though - the audio and lip sync stuff is actually really impressive compared to WAN 2.2 S2V. That part is kinda amazing.
But everything else? Yeah... I am starting to see why Dark is not exactly the biggest fan.
Example of my bad results:
https://imgur.com/a/7jquFaR
Does the SDK node require you to have SDK integration in your python? i have manually installed the node but it still reads that the node is not imported correctly, which i assume is because you need SDK? prob a dumb question, im not too computer savvy, any help would be appreciated <3
imagine a DaSiWa ltx...
the flf don't work.. i2v work as well
Other then the flf2v not working this works good.
I had to change the check point and distilled lora to different ones as for some reason out of the box the ones you recommended/loaded made all the people turn into what i can best describe as greasy sweaty emaciated skeleton freaks
🤷♂️
Oh I'm sure it's incredible gotta try right away ... but V2V doesn't work with some RuntimeError
It's saying something about the tensor size...
The expanded size of the tensor (16) must match the existing size (28) at non-singleton dimension 2.
Somehow things got broken inside the workflow and I'm investigating why... I'm on a new version, but I'm a bit busy... I hope I can fix things soon
This is an insane workflow. The amount of work that went into making it modular and with the conditional switches is just top notch and super impressive. I've always loved your releases and i look forward to an LTX 2.3 checkpoint from you hopefully in the future. In the meantime keep rocking the Wan 2.2 releases and these insane workflows!
@darksidewalker i think we would all cry if you made a LTX checkpoint lol
i might be stupid for this question but when i use your wan2.2 workflow and models (which work fantastic btw highly recommended) i downloaded your wan models and use your workflow. with this workflow is there a specific ltx model i should use or use the normal ltx2.3 model? thanks in advance if anyone answers
Thanks for the WF, works pretty great. Not sure if it's my settings or prompt by I2V audio voice seem very stilted, stretched out, and overall unnatural. Any advise? Also, I had to manually disable the load audio block inside the subgraph even tho I set load audio to False.
Can't seem to dl the RTX nodes because of security or something.
Head over to .\ComfyUI\user\__manager > config.ini
Open the file, then set;
security_level = weak
Then restart ComfyUI.
I hope this helps!
Alright, now we are talking, Dark! Version 1.3 is an absolute banger this time. I had zero issues with this workflow, and the quality is even better now. The audio still sucks at times unless I feed it my own audio files, but you already know how that goes. lol
You are definitely what makes this community great, and you should keep being awesome!
Thx for the kind words 😊 I really appreciate 🙏
I think I am missing the trick to getting the V2V with audio input to work on this. Was getting tensor size issues and realized it was because the config frame count (seconds x fps) wasn't matching the input video... I adjusted (129 frame video, changed input to 32x4) and now I am getting "dimension size must be non-negative" so I think I somehow overshot it even though it should match now.
I'm still not 100% sure how v2v is working on this... Normally the set seconds should add to the input, expanding the video, not exactly converting one video into another... But there is no real reference to this
love it, but the "capture last frame" doesnt work :(
edit : no it's not.
are you making a model for ltx?
Should I?
Honestly @darksidewalker, I bet you could fix all the nasty little issues LTX-2.3 still has, maybe even the audio stuff too.
And yeah, I really mean that - I genuinely think you could pull it off after seeing your WAN 2.2 comparison with other models. At this point, if anyone could whip that thing into shape, my money would be on you. 💸
@darksidewalker probly since wan dosent seem to be sending out a new model
Thank you for the workflow! LTX seems to be a great alternative to WAN nowadays.
Nah, only on limited use-cases ;)
is it just me or does ltx 2.3 seem more blank with facial emotions cause ltx 2 seems way more expressive. do you have any prompts to force more facial expression ive tryed many and still just lifeless emotion
There is not much reference, since ltx2.3 is really new and there are not even optimal settings and parameters known.
What is "placeholder.gguf"?
Where can I get it?
Is this a genuine question?
@darksidewalker Naturally. I downloaded all the models from workflow except "placeholder.gguf". The search didn 't find it? Accordingly, I can't generate anything, ComfyUI gives an error at the very beginning of generation.
It's a placeholder, there is no such model
@darksidewalker Should I replace it with any LTX 2.3 gguf model?
@77rocknroll278 the "placeholder" is a way of saying "insert gguf model".
If you're going to use a Clip in gguf format, it would be better to use gemma 3 12b heretic.gguf, but if you're going to use it in bf16 "safetensors" format, you shouldn't put anything there
I cant find the node to select the spatial upscaler and keep getting the below error
LatentUpscaleModelLoader
Model in folder 'latent_upscale_models' with filename 'ltx-2.3-spatial-upscaler-x2-1.1.safetensors' not found.
You don't need to find the node, you can set it in the basic settings gui
@darksidewalker TYSM, i found the fix, i had to click properties on the"Load Latent Upscale Model spatial 2x resolution" and under the "Sparcial Resolution Upscaler Model" heading it was not selected, pardon me i have zero technical knowledge , but your wf is amazingly fasttttttttttt...
rtx 3060 12 GB can work
Great work! I'm having fun!
The only downside is that audio seems to cut often after 4s, any idea what could cause this? :(
On a 8s video, the sound cuts at 4s.
On a 20s video, the sound custs at 10s. Hope this helps.
@Fellaitio I did not notice that, can you tell your settings?
@darksidewalker I'm guessing audio wise, so here it is:
Audio Features
I have Audio MMAudio toggled on, rest is off.
No positive prompt
default negative prompt
BACKEND, pretty much all default but just in case:
mmaudio vae 44k fp16
mmaudio synchformer fp16
clip model applefp16
mode 44k
precision fp16
steps 25
cfg 4.5
Mask_away_clip false
force_offload false
if other info are needed, please follow along:
Using E01# I2V
Using T01# I2V
No rendering features I2V
No rengedering features S2V
no Combine videos
resolution R06#
Post-processing features P01# and P05# and P06#
using BoundBiteV10, littledemonv2, rest is default.
did not toggle on trim to audio or other audio affecting features, to my knowledge, everything is default.
@Fellaitio MMAudio needs 24 fps as describes in the notes. That's why it cuts off.
Besides that, this has nothing to do with the LTX workflow here. I assume you refer to the C-AiO
@darksidewalker Oh shit, I thought I was on wan2.2 page (i was using your LTX for the past few days, that might be why! I'm sorry! And yes C-AiO fast fidelity wan 2.2
My fps was set at 24fps
refiner step 2
steps total 4
cfg 1
fps 24
seconds (4, 5, 10, 20 were tested, all stopped around the 50% mark)
@Fellaitio Ah, I see now. You interpolate the MMAudio clip, adding 50% more frames. That could be why.
I did a test and a 24fps video just renders full audio. Try without interpolation. MMAudio will want static 24 fps.
@darksidewalker Genius! that makes total sense. Didn't think of the extra features at the end. Damn. Any ways to implement interpolation with audio in the future versions? :O
@Fellaitio it is, but MMAudio is strict.
@darksidewalker Alright, no pressure then! Thank you so much for your time! <3
Great workflow as always! Any plans for adding ID-LoRA support for audio/voice reference?
I'm not quite sure what you mean, there is Lora support or is this different?
Can it lipsync?
Yes
@darksidewalker based
I have a problem with the maintenance of the face and body. The character has a stylized head on the image as I had generated it, and then when it goes into the video side his face changes and it looks more like the same person... Would it come from Checkpoints? Because in the videos I've seen below, faces even if they're in anime or stylized version stay pretty much compared to mine. Would you know why?
Ltx2.3 is not very stable on details, try more, higher resolution or better prompts help
Great workflow, thank you! 😃
But the model that is used in the workflow behaves very strangely for me, is it like a anime/cartoon focused one?
I've used it for realistic images and It seems to exaggerate motions and especially expressions a lot, ranging from uncanny to terrifying. Skin details get incredibly overdone, wrinkles everywhere and most the time the body gets covered in so many moles and blemishes that it looks like they have a hideous skin disease 😆
It's interesting because i've tried a GGUF quant and the eros model and they're absolutely fine. I probably would have given up on LTX if I didn't decide to give them a try after. I don't know if i've missed something or that model is just like that.
I don't know if you're planning on doing an LTX model tune yourself, but I hope you do. LTX2.3 seems like it has a huge amount of potential that just needs unlocking by some model wizard like yourself! 😁
Also another quick question: Does chunking have any negative effects like lowering quality? Or is it something to always use to help with efficiency?
Hard to know, I noticed that if evidently a distilled model + speed Lora is used the effect you describe is happening. My first workflow had the bug that both where active.
To you question I did not enough testing, but I did not notice any degrading with chunking, but also I did not try really long video's.
The workflow works great but is there a reason there is no negative prompt?
Also, what is prompt enhancer for? What text should be used that is different from the normal prompt?
Prompt enhancer is trying to enhance the prompt. Distilled LTX cannot use negative prompting.
is it posible to add audio to an existing video? for example: to make a video with wan 2.2 of a nsfw scene and after that add a coherent audio for the video with ltx 2.3.
It's already in
@darksidewalker TY
@darksidewalker hey, sorry to bother you again, could you tell me how to create audio for an existing video without affecting the video resolution? I've tried bypassing certain nodes but haven't been able to figure it out; there are many interconnected nodes and I'm just starting out with LTX.
@hectorium no scale switch would do that.
Excellent workflow, I cant even guess how much time it took to make. Thank you.
I am having 2 issues though:
1: Any video I make doesnt save the workflow in the video. I have the metadata option turned on, but whenever I drag a view into ComfyUI it just loads the video and not the workflow.
2: I cant seem to install the RTX video upscaler. I use the Manager to install the missing nodes, but it always fails at installing.
I really love the workflow, it works incredibly well, except for one issue: when I try to generate NSFW content with I2V or T2V, the bodies look blurry and the scenes are hard to understand. I’ve tried different LoRAs and checkpoints, but I can’t figure out what’s wrong. It’s impossible to get results like the ones shown in the videos below… I’m probably just being an idiot, but I’d really appreciate any help.
The basic LTX model is censored
Some may be v2v, made with a model like my DaSiWa models
I'm getting amazing outputs with reasoning_i2v_v3, NSFW_furry_concat_v2, and Penile_Praxis_v4 all set at 0.5 strength. And I don't do furry content but they help a lot with all nsfw work and no annoying 'porno' voices. Also RTX super sampling option is a game changer.
@delta45424155 what do you mean by game changer ? ist good or not ? ><
@Fakcup Yes, it is good.
Just wanted to say thank you for the workflow. After trying so many workflows I almost gave up on ltx2.3 for now. But you made me a believer.
https://huggingface.co/Kijai/LTX2.3_comfy/tree/main/diffusion_models
What would I need to do to use the 1.1 version? It works but is taking 14 minutes lol.
Without distillation it will need a bunch of lifetime
You could also try: https://civitai.red/models/2543443
The LTX is amazing! Unfortunately, even though I've downloaded everything, installed it, and updated everything, I haven't been able to get it to work. The workflow keeps throwing errors: ZeroDivisionError: float division by zero, TypeError: ‘NoneType’ object is not iterable, and many others. I have everything updated, and your other WAN workflows work perfectly for me! Any tips?
Nvidia RTX Nodes won't install. Tried updating and even downloaded a new ComfyUI.
Go to the Issues section on RTX Nodes' GitHub; that should solve your problem.
@xcshwi I tried with no luck but thanks. I don't understand anything there, I tried copying and pasting the command thing in one of the replies to a command window but it doesn't do anything. I don't know enough about this stuff to figure it out sadly.
@Kaymart you could try my installer, if you are not familiar with installing custom_nodes:
https://civitai.red/models/2364056/tool-dasiwa-comfyui-installer
@Kaymart Security level for the manager has to be on weak it's probably on normal now, also wait for the run bat to fully compile the confyui registry data so that it can be located in the manager
Thank you so much for creating this workflow. Seriously, I'm floored by how easy it was to set up and how fast the model works!! Gotta disable SMZnodes though!
What are SMZ nodes? :)
@darksidewalker Something I picked up when I was poring through embedded workflows of an image creation contest, they were using it for their KSampler as I recall. No special need for it, I think I can get rid of it altogether. It conflicts with LTX.
Your workflow for wan working super.
This - LTX does not work. Constantly running out of memory. Although both the official wf and
those created by other modders - work.
In addition, I can't find anywhere to set the start of the additional audio track - only the video duration. It's not clear where to put 'tae'.
You can't set the video size either. Let's say 1280x720. Only horizontal videos.
Well...
1# it works, I used it for days
2# How should I know why you OOM this can only be a setting or the model you use
3# There is no extra audio track, I do not know why you assume this
4# It is clear that the vae should be set on the model node, it is described and bookmarked
5# There are tons of options to set the resolution and aspect, also described in the notes
I mean this (about extra sound track) :
"Settings and inputs" - "settings and backend" node. List down. At end - "choose audio to upload". "Choose soundmark to upload". "Choose watermark to upload".
"Basic settings" - "Audio" - "Audio input".
With vae is all clear. But WF ask me for tae...
Hello ! This is great work ! Thank you so much for this !
I just have a simple doubt:
I want to generate video with my own audio file which I will make from elevenlabs. so if it is in different language will it work ?
And
How do I get it to lipsync ? I tried to upload the file but it is just being played in background.
Any suggestions/guidance regarding this is much appreciated ! Thank you all !
Has anyone had good results using the prompt enhancer? So far I've just found it adds useless rubbish to the start ("Okay, here's a prompt generated from your input, keeping blah blah blah..." - then just my original prompt with two words changed) - then cuts off abruptly towards the end mid-sentence.
Yeah the ltx prompt enhancement does not always work, I don't like it, but it is what they gave us
If I want ai to help with prompt; I load koboldccp with whatever llm you want. But you'll have to google on how to write a system prompt to help guide your selected llm.
Mine just fails and crashes when I use it, heh. Glad I'm not missing out on anything
@MedliKnight the prompt enhancer needs a big portion of VRAM, maybe that's why .
I've tried several things, but I can't get a correct lip sync. Sometimes the audio starts, but the mouth doesn't move. Other times, lip sync starts after about 3 seconds, and I lose the initial audio. Is there some setting I'm doing wrong ? Thank you.
Hello Sir, Can you add a middle frame to this? First, middle, and last frame... please
Hello Sir, Can you add a middle frame to this? First, middle, and last frame workflow... please
T2V seems terrible, tried a bunch of different settings and lora with various strengths but it seems to always have merged body horror, often having genitals in the middle of the chest and seemingly ignoring or butchering prompt directions
That cannot be a problem with the workflow. Must be the checkpoint, what you are using?
@darksidewalker DaSiWa LTX 23 treasure chest
@dalkx9455834 okay noted
For 32 GB of RAM, what is the best combination?
Can't tell
I can't get certain sets of custom nodes to load, update, and even install properly. Tried manually and through the node manager. Uninstalling, reinstalling, updating, different older versions, and of course updating comfy in general. What am I doing wrong??
These:
comfyui-gguf
whiterabbit
comfy_nvidia_rtx_nodes
comfyui-gguf-fantasytalking
comfyui-zlycoris
anyone else having problems with audio over the last couple days. Everything was working perfectly, but now all of my generations are completely silent. I'm assuming an update broke something.
Hi, I had the same problem.
The solution I found was to delete the custom node "ComfyUI-KJNodes" and then download it again, either through the node manager or manually from GitHub
@Xont17 Yeah comfyui update broke kijai nodes. Reinstall of the latest version is required.
Can't get this WF to work, stumbling from one error to another.
I am using exactly all models from the list.
CLIP Text Encode (Prompt)
AttributeError: 'NoneType' object has no attribute 'dtype'
etc etc.
You have 2 options:
1# Install latest comfy and all custom nodes correctly
2# Try my comfyui installer
@darksidewalker thank you for your answer.
My current ComfyUI install works great, and I feel like adding anything 'exotic' like the RTX package, which does not work without using hacks and tricks, lowering security level, makes my installation crumble down like a house of cards.
I really like your WAN workflows, but I never struggled that much to get those custom nodes to install like with this LTX workflow. It feels more and more like a puzzle with missing parts.
Please don't get me wrong, I highly appreciate your work, but after swearing and shouting a whole morning at my PC , I feel like I should skip that one and use another LTX WF.
@Bbird The RTX node is not exotic, it is the most common and fast RTX upscaler right now.
But it is, for sure, entirely up to you.
All other nodes are almost the same like in my WAN workflows, even less dependencies, so whatever is your problem can not be this, except your comfy is outdated and you are missing core nodes for LTX23.
Seems to be a pretty good workflow. 2 suggestions: Use res_2s and bong_tangent as sampler/scheduler. Use a negative prompt too (combined with LTX2 NAG from KJ Nodes).
Why that suggestion?
Any tested benefits?
@darksidewalker well, depending on the LTX Model sometimes i got subtitles, or annoying music - but with "cartoon, still image, bad quality, subtitles, text, watermark, overlay effects, music" in the negative prompt i got rid of this, this really can improve the output!
The previous version worked great! However, I'm having an issue with the Prompt Enhancer with this 2.2 version. I get this error:
NotImplementedError: Cannot copy out of meta tensor; no data!
I've uninstalled everything and reinstalled all the custom nodes....
I have no indication what this error come from, I updated my comfyui today and everything worked fine. You are sure everything is installed correctly? Can you try my comfyui installer and test again with that to be sure?
Hello my brother! I am a huge fan of your work. And i tried your workflow - but it doesent matter how low i set the resoluton i keep getting OOM, even if i put the chunking up. I've used various ltx 2.3 workflow they work seamlessy + i have an rtx 5070 ti with 16gbvram, so I personally dont think this should be an issue. Anyways, thanks for your hard work, i am excited to try out your checkpoint, and hope to fix that problem since your workflow is pretty handy!
I use the workflow with 16gb VRAM, so I know it works. Even with 1.2MP or higher, so this shouldn't be a problem
@darksidewalker it seemed that double fps made the issue. Couldnt imagine that it makes such a huge difference. I thought if its gonna get upscaled with rtx super resolution later, its necessary. Reallly good quality. My new go to workflow. keep the work up^^
Having been (successfully) using my own modified workflow for a while, I found switching to this one easy and seamless - it works great right out of the box, thank you! One growing pain I’m having is that, even with prompt enhancement disabled, this workflow has to re-encode the prompt each run, which dramatically slows down generation time if I’m running the same scene with some seed/lora changes only. Any idea what might be causing that, and any idea how to fix? I used to be able to run subsequent generations with the same prompt without re-encoding, but on this workflow it re-encodes every time
Since this is optimized for low VRAM, what settings do we need to change if we wanna use high VRAM, other than swapping the gguf model for the bf16 one?
Never use gguf, except you have to. Don't use tiled VAE, don't use chunking, do as high resolution as you can. That's it👍
@darksidewalker got it, thanks.
On V2V Voice over with audio input - It seems like the first ~4 seconds of the input is getting garbled before it plays the remainder of the clip. Are there any specific models/lora/vae needed for that to work?
Running models below:
Model: I tried both the ltx-2.3-22b-dev.Q6_K.gguf and Dasiwaltx23lightspeed_treasurechestv1.safetensor
Clip: gemma3-12b-hereticx-sikaworld, ltx-2.3-text_projection_bf16
Vae: ltx-2.3-22b-dev.audio.vae, ltx-2.3-22b-dev.video.vae,
I tried updating comfyui to the latest nightly and confirmed all nodes are up to date. I also tried replacing the audio from the video with an empty latent (from an audioless ltx video). I tried with both an ltx and wan video input with the same result.
Only active lora is: ltx-2.3-22b-distilled-lora-1.1.safetensors
One other gotcha I found: If I try to set the video longer than the audio input I get a runtime error (RuntimeError: zeros: Dimension size must be non-negative.).
WF looks great, I've been looking for a good V2V for adding audio, so hopefully its just something I'm doing wrong. Thank you.
I'm not sure what's going on here.
I've tried this workflow but it just doesn't work, first it asks for a VAE (LTXTaeltx2_3.safetensors) but there is no slot for it, had to do it through the error tab.
Then I get an error saying TypeError: 'NoneType' object is not iterable, even though I had set all audio switches to off as I don't want audio, ok, so I set the video one to on, now it wants an mp4 file, I put in something random, now it runs past this... And it's asking me to upload a video for my first to last frame generation (???), I put something random in there to see if it works, now it wants a middle frame image (???), ok, same image again, so it starts processing and it fails due to lack of ram on a 5080.
I have no idea what's going on here.
Managed to create a video manually bypassing the video node and setting the model audio option, but the result was deformed and the character was talking about some random nonsense moving around when the prompt clearly stated she should stay still like a statue.
Nothing works. Prompt enhancer? Simple copy-paste of the initial prompt. Video output? So blurry that sujects seem to vaporize.
seems like an ultimate workflow but I cannot make it work. I wish there was an ultimate workflow tutorial video too :) especially for low vram settings
No frame interpolation ? No Negative prompt ?
You cannot interpolate a video with audio as post-process. NAG and negative is not implemented yet.
V2V distort audio and the added portion ignores prompts
This looks promising:
https://github.com/kijai/ComfyUI-PromptRelay
After first I2V generation with model audio input I'm starting to get error about not find a file 'demoness.flac', but I've not change a mode to I2V with audio input.
DaSiWa, I feel like most of what you put out is gold. love using what you create. The workflow works well for me after updating everything, which you regularly tell people to do. The model, I see the potential and know this is new territory. I wish I could get my outputs to look as clean as the examples to but that isn't happening for me. Anyways, appreciate the work you do! Bound Bite is fantastic btw!