LTX IMAGE to TEXT to VIDEO with STG workflow

LTX IMAGE to TEXT to VIDEO with STG workflow - v1.0

NSFW

Workflow: Input Image (or prompt) -> captioning to a text prompt -> prompt is used for LTX TEXT to VIDEO (this is a Text to Video workflow, see my other workflow for Image to Video)

V5.0: Support for LTX 0.9.5 GGUF Models and Wavespeed/Teacache

LTX 0.9.5 GGUF Model and VAE: https://huggingface.co/calcuis/ltxv-gguf/tree/main

(vae_ltxv0.9.5_fp8_e4m3fn.safetensors)

(Clip Textencoder): https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main

Worklfow supports Florence caption and LTX Prompt enhancer and works with all models (0.9 / 0.9.1 / 0.9.5)

(see notes in workflow for more details)

V4.0: Support for GGUF Models

GGUF Model, VAE and Textencoder can be downloaded here:

(Model&VAE): https://huggingface.co/calcuis/ltxv-gguf/tree/main

(Clip Textencoder): https://huggingface.co/city96/t5-v1_1-xxl-encoder-gguf/tree/main

(includes a GGUF Version and a GGUF+TiledVae Version for low Vram)

V3.1: Support for model 0.9.1

V3.0: GUI Clean up, reduced no. of custom nodes, feature to use your own prompt.

V2.0: Introducing STG (Spatiotemporal Skip Guidance for Enhanced Video Diffusion Sampling).

GUI includes two new nodes in blue:

STG settings, showing CFG, Scale and Rescale. Plus a switch to change between two layers of the model to be skipped (8 or 14 (default), chose "true" for layer 14 or "false" for layer 8)

I copied a note in the workflow with further info and usable values/limits. Feel free to experiment. In my testing, I kept the values within STG settings as default and just used the switch.

Node "Modify LTX Model" will change the model within a session, if you switch to another worklfow, make sure to hit "Free model and node cache" in comfyui to avoid interferences.

V1.0: ComfyUI Workflow: LTX IMAGE-to-TEXT-to-VIDEO Using Florence2 Caption

This workflow transforms the input images into a prompt (Florence2 for captioning) and uses the LTX Text to Video model for video generation (Image -> Prompt -> Video)

Description

FAQ

Comments (9)

SamsuraDec 2, 2024

CivitAI

Thank you, but i am stuck, IT question: I get alot undefined nodes in ComfyUi: When loading the graph, the following node types were not found:
DownloadAndLoadFlorence2Model

Florence2Run

Float

JWInteger

ttN seed

KepStringLiteral

The manager dont help..any thoughts? What to do when things are undefined?

tremolo28

Author

Dec 2, 2024

Usually it helps to „Update All „, restart, then „Install missing nodes“, both in Comfyui Manager

SamsuraDec 2, 2024

@tremolo28 All up to date, missing customs nodes are still empty, anyways I wont bother you with this, thanks.

loneillustratorDec 6, 2024

@Samsura same man

tremolo28

Author

Dec 6, 2024

@loneillustrator. If "update all" and "install missing nodes" did not help, maybe check if you are on the right Chanel (Manager:Channel: default, is what I use). Other than that I can not realy support with comfyui related issues. I am kind of a comfyui noob myself ;)

GitarooManDec 4, 2024

CivitAI

how do you get it to pan in so slowly? I put slow pan and fast pan in the negative and it's still goes psycho on a very simple prompt

tremolo28

Author

Dec 4, 2024

I just drag/drop a picture in the worklfow, the rest is done by the model/setup. Maybe try different seeds.

gorathan274Jan 9, 2025

there seem to be different things which trigger movement : 1. seed : it seems to have the impact that, if lower , then slower (movement), or less movement.
2. max and min shift
3. cfg
4. frame_rate in conditionint (would'nt prefer that)

rocky533Jan 30, 2025

Ltx does not understand the word pan, nor scroll, nor follow, nor many other keywords. The camera follows the subject of the prompt(whatever is most detailed). Camera movement is nearly impossible to control beyond point at subject. Crop of your image can impact it but not enough to be reliable.

Workflows

Other

by tremolo28

Download (Beta) View on CivitAI