Watch the full video first if you want to understand how this subtitle and watermark removal workflow works in practice. The video shows how an input video can be cleaned through a Bernini-R video-to-video editing route, while preserving the original people, motion, scene structure, camera framing, timing, and audio as much as possible.
This ComfyUI workflow is designed for subtitle, caption, and watermark removal from video. Its main purpose is to remove overlaid text from a source video and reconstruct the background behind the removed area so the final result looks clean and natural. This is not a full video restyling workflow. It is a targeted video repair workflow focused on cleaning visible overlay artifacts while keeping the source video identity and motion unchanged.
The workflow is built around Bernini_HIGH_fp8_e4m3fn_scaled.safetensors and Bernini_LOW_fp8_e4m3fn_scaled.safetensors as the main Bernini-R video editing models. The text encoding route uses umt5_xxl_fp8_e4m3fn_scaled.safetensors with the WAN text encoder type. The VAE route uses wan_2.1_vae_Comfy-Org.safetensors. The graph also includes SageAttention patch nodes, UnifiedReward-Flex LoRA modules, LightX2V LoRA modules, BerniniConditioning, KSamplerAdvanced, VAEDecode, CreateVideo, and SaveVideo.
The input section starts with a source video. GetVideoComponents separates the video into image frames, audio, frame rate, and bit depth. The image frames are scaled through image_scale_pixel_v2 before entering the Bernini conditioning route. This allows the workflow to process the source video as a controlled video-to-video editing task while keeping the original audio available for final output.
The prompt is built specifically for subtitle and caption removal. The positive prompt tells the model to remove all overlaid subtitles and captions, cleanly reconstruct the background behind the removed text, and match surrounding colors, textures, lighting, shadows, motion blur, camera movement, and temporal consistency. It also explicitly tells the model to preserve original people, faces, identity, clothing, body motion, camera framing, scene content, background, lighting, timing, and audio.
The negative prompt suppresses subtitles, captions, overlaid text, burned-in text, Chinese characters, English letters, lyrics, lower-third text, text remnants, ghost text, smeared letters, watermark text, flicker, jitter, blurry inpainted patches, warped backgrounds, changed faces, changed identity, changed clothing, altered motion, scene cuts, bad video, and low quality.
The generation route uses BerniniConditioning to combine the source video, prompt conditioning, VAE, dimensions, length, and reference capacity into a latent editing task. The workflow then uses two KSamplerAdvanced passes, separating high-noise and low-noise stages. This structure helps the model first perform the main removal and reconstruction, then refine the cleaned frames with stronger temporal stability.
After sampling, VAEDecode converts the edited latent frames back into images. CreateVideo combines the cleaned frames with the original audio and frame rate. SaveVideo exports the final cleaned video.
Main features:
Subtitle and watermark removal video workflow
Bernini-R video-to-video editing route
Removes overlaid subtitles and captions
Reconstructs background behind removed text
Preserves people, faces, identity, clothing, and motion
Preserves camera framing, scene content, timing, and audio
Bernini_HIGH_fp8_e4m3fn_scaled.safetensors support
Bernini_LOW_fp8_e4m3fn_scaled.safetensors support
WAN VAE decoding
UMT5 WAN text encoder
BerniniConditioning video editing control
High-noise and low-noise sampling stages
KSamplerAdvanced repair route
UnifiedReward-Flex LoRA support
LightX2V LoRA support
SageAttention patch support
GetVideoComponents audio and FPS preservation
CreateVideo final assembly
SaveVideo final export
Suggested workflow:
Upload the source video first. This workflow is best used when the subtitle, caption, or watermark is an overlay on top of the video rather than a natural sign or object inside the scene. Use the default prompt when you want to remove subtitles while preserving the original video as much as possible. If the removed area leaves ghost text, strengthen the wording around clean background reconstruction and no visible remnants. If the model changes faces, clothing, or motion too much, simplify the edit instruction and emphasize preservation. For best results, use videos with stable framing, limited heavy motion behind the subtitles, and clear surrounding background textures.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2067822291773378561?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1xw7F6XE7K/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
⚙️打开下方链接即可在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2067822291773378561?inviteCode=rh-v1111
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1xw7F6XE7K/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。
