CogVideoX-v1.5-5B I2V workflow for lazy people (Including low VRAM)

CogVideoX-v1.5-5B I2V workflow for lazy people (Including low VRAM) - Florence

NSFW

Update the Florence version：Many people encounter dependency errors when using the Joy Caption plugin. I use Florence as a replacement—it’s easier for beginners to avoid these issues.

Not an upgraded version of the previous one.

By using an LLM to write prompts for CogVideoX-v1.5-5B I2V, it helps those who don't know how to write prompts or are too lazy to do so make better use of CogVideoX v1.5. It also allows users to choose to add guiding prompts or turn off the LLM feature and write prompts entirely on their own.

Although the v1.5 version supports any resolution, there are still differences in quality depending on the resolution. You can test multiple resolutions to find the best one.

For low VRAM users, please keep the following features enabled.

The GGUF version doesn't perform that well based on my tests, but v1.5 is still being updated, so we can expect better results in future versions.

If this happens, it's because the LLM prompt is too long. You can change the random seed to regenerate, or modify the values below to reduce the tokens.

Description

Many people encounter dependency errors when using the Joy Caption plugin. I use Florence as a replacement—it’s easier for beginners to avoid these issues.

FAQ

Comments (19)

civitaimasterNov 25, 2024· 5 reactions

CivitAI

Hello, could you maybe provide some rendering time info, like what GPU you use and how long it take for one video? Thank you!

CODOLOWNov 25, 2024· 1 reaction

i never tried video gen mainly because i need little infos like that too... it would be be nice. thanks anywaqy for contributing !

Boodengs

Author

Nov 26, 2024· 2 reactions

It should run fine with 12GB of VRAM or more. My GPU is a 4090, and it takes about 3 minutes for a 1024**640 resolution on num_frame 49. The time varies with the generation resolution; for example, 1216**832 takes about 5 minutes. The time will be longer with num_frame 81.

darojimiNov 29, 2024· 1 reaction

RTX 3060 12GB handles this too. With flash-attn and not using GGUF version 60 frames 50steps with final output 768x1024 its about 20 minutes from image with bf16 precision of model and fp8_e4m3fn quantization + fuse sdpa + compiled at first run(was not very much longer) and vae tilling enabled. Epic workflow thanks!

supneoDec 4, 2024

CivitAI

Hi, can someone help me? In comfyui it asks me for Image Switch // RvTools and I have been able to install all of them except this one.

Boodengs

Author

Dec 4, 2024

If you can understand the workflow, this node won't be necessary. You can connect the image on the right of 'Resize Image' to the 'start_image' on the left of 'CogVideo ImageEncode' to skip installing this node.

paralaif_xoxDec 5, 2024· 1 reaction

Go to custom_nodes directory and run "git clone https://github.com/Rvage0815/ComfyUI-RvTools.git" then restart comfyui and refresh.

supneoDec 5, 2024

Ok, thanks for your help :-)

616570242576Dec 9, 2024

CivitAI

Joy_caption

Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory G:/comfyui/models/clip\siglip-so400m-patch14-384.

Boodengs

Author

Dec 10, 2024

Recommend using the Florence version. There's no difference between the Florence version and the Joy Caption version, but Florence can help avoid many dependency errors.

joy caption：https://github.com/StartHua/Comfyui_CXH_joy_caption

kallamamranDec 12, 2024

CivitAI

I get:

LLMLoader

Failed to load model from file: Q:\ComfyUI\models\LLavacheckpoints\Meta-Llama-3.1-8B-Instruct-Q5_K_S.gguf

That's even though I have the file there 🤔

I have manually downloaded Meta-Llama-3.1-8B-Instruct-Q5_K_S.gguf and placed it into Q:\ComfyUI\models\LLavacheckpoints

Searching for the file online though I find atleast three different versions of the same name: 5.4GB, 5.6GB and 5.7GB 🤔

kallamamranDec 12, 2024

Actually this is the full error:
llama_model_loader: loaded meta data with 29 key-value pairs and 292 tensors from Q:\ComfyUI\models\LLavacheckpoints\Meta-Llama-3.1-8B-Instruct-Q5_K_S.gguf (version GGUF V3 (latest))

llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.

llama_model_loader: - kv 0: general.architecture str = llama

llama_model_loader: - kv 1: general.type str = model

llama_model_loader: - kv 2: general.name str = Meta Llama 3.1 8B Instruct

llama_model_loader: - kv 3: general.finetune str = Instruct

llama_model_loader: - kv 4: general.basename str = Meta-Llama-3.1

llama_model_loader: - kv 5: general.size_label str = 8B

llama_model_loader: - kv 6: general.license str = llama3.1

llama_model_loader: - kv 7: general.tags arr[str,6] = ["facebook", "meta", "pytorch", "llam...

llama_model_loader: - kv 8: general.languages arr[str,8] = ["en", "de", "fr", "it", "pt", "hi", ...

llama_model_loader: - kv 9: llama.block_count u32 = 32

llama_model_loader: - kv 10: llama.context_length u32 = 131072

llama_model_loader: - kv 11: llama.embedding_length u32 = 4096

llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336

llama_model_loader: - kv 13: llama.attention.head_count u32 = 32

llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8

llama_model_loader: - kv 15: llama.rope.freq_base f32 = 500000.000000

llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010

llama_model_loader: - kv 17: general.file_type u32 = 16

llama_model_loader: - kv 18: llama.vocab_size u32 = 128256

llama_model_loader: - kv 19: llama.rope.dimension_count u32 = 128

llama_model_loader: - kv 20: tokenizer.ggml.model str = gpt2

llama_model_loader: - kv 21: tokenizer.ggml.pre str = llama-bpe

llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,128256] = ["!", "\"", "#", "$", "%", "&", "'", ...

llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,128256] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...

llama_model_loader: - kv 24: tokenizer.ggml.merges arr[str,280147] = ["Ġ Ġ", "Ġ ĠĠĠ", "ĠĠ ĠĠ", "...

llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 128000

llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 128009

llama_model_loader: - kv 27: tokenizer.chat_template str = {% set loop_messages = messages %}{% ...

llama_model_loader: - kv 28: general.quantization_version u32 = 2

llama_model_loader: - type f32: 66 tensors

llama_model_loader: - type q5_K: 225 tensors

llama_model_loader: - type q6_K: 1 tensors

llm_load_vocab: special tokens definition check successful ( 256/128256 ).

llm_load_print_meta: format = GGUF V3 (latest)

llm_load_print_meta: arch = llama

llm_load_print_meta: vocab type = BPE

llm_load_print_meta: n_vocab = 128256

llm_load_print_meta: n_merges = 280147

llm_load_print_meta: n_ctx_train = 131072

llm_load_print_meta: n_embd = 4096

llm_load_print_meta: n_head = 32

llm_load_print_meta: n_head_kv = 8

llm_load_print_meta: n_layer = 32

llm_load_print_meta: n_rot = 128

llm_load_print_meta: n_embd_head_k = 128

llm_load_print_meta: n_embd_head_v = 128

llm_load_print_meta: n_gqa = 4

llm_load_print_meta: n_embd_k_gqa = 1024

llm_load_print_meta: n_embd_v_gqa = 1024

llm_load_print_meta: f_norm_eps = 0.0e+00

llm_load_print_meta: f_norm_rms_eps = 1.0e-05

llm_load_print_meta: f_clamp_kqv = 0.0e+00

llm_load_print_meta: f_max_alibi_bias = 0.0e+00

llm_load_print_meta: f_logit_scale = 0.0e+00

llm_load_print_meta: n_ff = 14336

llm_load_print_meta: n_expert = 0

llm_load_print_meta: n_expert_used = 0

llm_load_print_meta: causal attn = 1

llm_load_print_meta: pooling type = 0

llm_load_print_meta: rope type = 0

llm_load_print_meta: rope scaling = linear

llm_load_print_meta: freq_base_train = 500000.0

llm_load_print_meta: freq_scale_train = 1

llm_load_print_meta: n_yarn_orig_ctx = 131072

llm_load_print_meta: rope_finetuned = unknown

llm_load_print_meta: ssm_d_conv = 0

llm_load_print_meta: ssm_d_inner = 0

llm_load_print_meta: ssm_d_state = 0

llm_load_print_meta: ssm_dt_rank = 0

llm_load_print_meta: model type = 8B

llm_load_print_meta: model ftype = Q5_K - Small

llm_load_print_meta: model params = 8.03 B

llm_load_print_meta: model size = 5.21 GiB (5.57 BPW)

llm_load_print_meta: general.name = Meta Llama 3.1 8B Instruct

llm_load_print_meta: BOS token = 128000 '<|begin_of_text|>'

llm_load_print_meta: EOS token = 128009 '<|eot_id|>'

llm_load_print_meta: LF token = 128 'Ä'

llm_load_print_meta: EOT token = 128009 '<|eot_id|>'

llm_load_tensors: ggml ctx size = 0.30 MiB

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 292, got 291

llama_load_model_from_file: failed to load model

!!! Exception during processing !!! Failed to load model from file: Q:\ComfyUI\models\LLavacheckpoints\Meta-Llama-3.1-8B-Instruct-Q5_K_S.gguf

Traceback (most recent call last):

File "Q:\ComfyUI\execution.py", line 323, in execute

output_data, output_ui, has_subgraph = get_output_data(obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "Q:\ComfyUI\execution.py", line 198, in get_output_data

return_values = mapnode_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb)

File "Q:\ComfyUI\execution.py", line 169, in mapnode_over_list

process_inputs(input_dict, i)

File "Q:\ComfyUI\execution.py", line 158, in process_inputs

results.append(getattr(obj, func)(**inputs))

File "Q:\ComfyUI\custom_nodes\ComfyUI_VLM_nodes\nodes\suggest.py", line 292, in load_llm_checkpoint

llm = Llama(model_path = ckpt_path, chat_format="chatml", offload_kqv=True, f16_kv=True, use_mlock=False, embedding=False, n_batch=1024, last_n_tokens_size=1024, verbose=True, seed=42, n_ctx = max_ctx, n_gpu_layers=gpu_layers, n_threads=n_threads,)

File "Q:\ComfyUI\venv\lib\site-packages\llama_cpp\llama.py", line 338, in init

self._model = _LlamaModel(

File "Q:\ComfyUI\venv\lib\site-packages\llama_cpp\_internals.py", line 57, in init

raise ValueError(f"Failed to load model from file: {path_model}")

ValueError: Failed to load model from file: Q:\ComfyUI\models\LLavacheckpoints\Meta-Llama-3.1-8B-Instruct-Q5_K_S.gguf

Boodengs

Author

Dec 13, 2024· 1 reaction

@kallamamran Try updating Comfyui

6400043Dec 14, 2024· 3 reactions

CivitAI

Fyi there is an issue current if you try to install llama-cpp-python from the manager. Because it tries to install the version 0.3.5 but this one exist only for metal on macos. So you need to download the 0.3.4 from https://github.com/abetlen/llama-cpp-python/releases/tag/v0.3.4-cu124 and install it with a command similar to this :

C:\ComfyUI\ComfyUI_windows_portable\python_embeded\python.exe -m pip install C:\ComfyUI\llama_cpp_python-0.3.4-cp312-cp312-win_amd64.whl

Erw456Dec 14, 2024

Hi, In noticed this problem. However when I installed the WHL file you mention, I got another error, my python.exe in the folder D:\ComfyUI_windows_portable\python_embeded was no longer valid, it was 0 bytes and python no longer worked. Had to reinstall it. The ComfyUI I am running uses Python version 3.11, so installing 3.12 files causes a problem I think, of am I overlooking something?

Erw456Dec 14, 2024· 1 reaction

When I run the WHL file - I get the error: ERROR: llama_cpp_python-0.3.4-cp312-cp312-win_amd64.whl is not a supported wheel on this platform.

jasonafexJan 3, 2025· 2 reactions

CivitAI

Very excited to try the workflow!

Both versions seem to need a different version of the LLMloader/Sampler and fails to download Image switch. Do I need to install those manually?

https://gyazo.com/a732daf9f85c6d7fbf948dfd20fdf070

_MsSnippet_Jan 12, 2025· 1 reaction

Did you ever solve this? I'm having the same issue.

cdyy2001519Jan 19, 2025· 1 reaction

It's the same problem for me.

Workflows

Other

by Boodengs

Download (Beta) View on CivitAI