Bernini-R Text-to-Video Cinematic Generation Workflow

Watch the full video first if you want to understand how this Bernini-R text-to-video workflow works in practice. The video shows how a simple text idea can be expanded into a more detailed cinematic prompt, then converted into a full video through the Bernini-R high-noise and low-noise generation route.

This ComfyUI workflow is designed for Bernini-R text-to-video generation. Its main purpose is to generate a complete video directly from text, without requiring a source video, source image, reference video, or reference image. Compared with image-to-video or video-to-video workflows, this graph is a cleaner pure T2V route. The user only needs to describe the scene, subject, action, camera movement, atmosphere, and visual style, then the workflow handles prompt expansion, conditioning, sampling, decoding, and final video export.

The workflow is built around the Bernini-R dual-model structure. It uses Bernini_HIGH_fp8_e4m3fn_scaled.safetensors and Bernini_LOW_fp8_e4m3fn_scaled.safetensors as the high-noise and low-noise model branches. It also uses UMT5 XXL fp8 text encoding, Wan 2.1 VAE, BerniniConditioning, KSamplerAdvanced, VAEDecode, CreateVideo, SaveVideo, and PathchSageAttentionKJ. The model route is further enhanced with LightX2V LoRA and UnifiedReward-Flex LoRA support for both generation stages, helping the workflow improve speed, visual coherence, and final output quality.

The prompt creation section is one of the key parts of the workflow. BerniniPromptEnhancer is set to the t2v task type. The user can enter a simple idea, such as a dramatic MotoGP racing scene, and the prompt enhancer will build a Bernini-specific system prompt. RHLLMChatNode then rewrites the idea into a detailed cinematic video prompt. The LLM output is cleaned through StringReplace nodes, removing the JSON wrapper before the final prompt is sent into CLIPTextEncode. This means the workflow can turn a short concept into a more complete generation instruction automatically.

The generation section uses BerniniConditioning in T2V mode. Since no source or reference media is connected, the conditioning node creates the video latent directly from text. The workflow is configured for a 1280×720 horizontal video output with 129 frames, making it suitable for cinematic previews, action scenes, product-style shots, landscape scenes, racing videos, fantasy shots, and general text-driven video concepts.

The sampling path uses two KSamplerAdvanced stages. The first stage handles the high-noise construction phase, where the main composition, motion, scene structure, and camera direction are created. The second stage handles the low-noise refinement phase, improving detail, stability, and final visual polish. After sampling, the latent is decoded through Wan 2.1 VAE, assembled into a video through CreateVideo, and exported through SaveVideo.

Compared with ordinary text-to-video workflows, this Bernini-R T2V setup is more structured. It does not only rely on a raw prompt. It combines prompt enhancement, LLM rewriting, Bernini task conditioning, dual-model sampling, SageAttention optimization, acceleration LoRA, reward-aligned LoRA, and final video output into one production-oriented pipeline.

Main features:

Bernini-R text-to-video workflow
Pure text input, no source or reference media required
Bernini HIGH / LOW fp8 dual-model route
UMT5 XXL fp8 text encoder
Wan 2.1 VAE decoding
BerniniPromptEnhancer T2V prompt creation
RHLLMChatNode automatic prompt rewriting
JSON cleanup chain for LLM output
BerniniConditioning T2V control
PathchSageAttentionKJ optimization
LightX2V high / low noise LoRA support
UnifiedReward-Flex high / low noise LoRA support
KSamplerAdvanced two-stage generation
1280×720 / 129-frame video setup
CreateVideo and SaveVideo final output

Suggested workflow:

Start with a clear text concept first. Define the subject, action, environment, camera angle, camera movement, lighting, mood, and final video style. Do not write only a vague sentence if you need a controlled result. Let BerniniPromptEnhancer and RHLLMChatNode expand the idea into a more complete Bernini video prompt, then check the cleaned prompt before rendering. If the first result lacks motion, describe the action and camera movement more explicitly. If the scene is too chaotic, reduce the number of subjects and simplify the environment. If the visual quality is weak, strengthen lighting, lens, texture, and cinematic composition language. Start with the default 1280×720 / 129-frame setup first, then adjust the prompt and seed after the basic direction is stable.

⚙️ RunningHub Workflow

Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2062533464225837057?inviteCode=rh-v1111

If the results meet your expectations, you can later deploy it locally for customization.

🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!

📺 Bilibili Updates (Mainland China & Asia-Pacific)

If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1yLEc6dEJc/

☕ Support Me on Ko-fi

If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk

💼 Business Contact

For collaboration or inquiries, please contact aiksk95 on WeChat.

⚙️打开下方链接即可在线体验，无需安装。
👉 工作流： https://www.runninghub.ai/post/2062533464225837057?inviteCode=rh-v1111
如果觉得效果理想，你也可以在本地进行自定义部署。

🎁 粉丝福利：注册即送 1000 积分，每日登录 100 积分，畅玩 4090 体验 48 G 超级性能！

📺 Bilibili 更新（中国大陆及南亚太地区）

如果你在中国大陆或南亚太地区，可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频： https://www.bilibili.com/video/BV1yLEc6dEJc/

我会在夸克网盘持续更新模型资源：
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户，方便进行创作与学习。

Description

Details

Files

berniniRTextToVideo_v10.zip

Mirrors