The BFS (Best Face Swap) LoRA series was developed for Qwen Image Edit 2509, specialized in high-fidelity face and head replacement tasks with natural tone blending and consistent lighting.
Each version builds upon the previous one:
🧠 Focus Faces: precise face swaps, keeping the original head shape and hair while transferring facial identity and expression.
🧩 Focus Head: stronger head swaps, replacing the full head (including hair and pose orientation).
The 2 versions complement each other, one is focused on face swapping and the other is focused on head swapping.
Share your creations that do not involve public figures or individuals who have not given consent. By sharing, you will earn Buzz, and your posts directly help me improve future versions by identifying and correcting potential issues.
Important Note: If you are going to use Qwen Image Edit 2511, update your comfyui before anything else, because without it you may have problems with completely distorted or ugly images.
Workflows:
Head/Face Swap Workflow - Qwen-Image-Edit-2509 | Civitai
My Custom Lightning LoRA:
Custom Lightning - Qwen Image Edit - 2511 | Qwen LoRA | Civitai
Alissonerdx/CustomLightning · Hugging Face
Test V3 here:
BFS Best Face Swap - a Hugging Face Space by Alissonerdx
Face Swap Video Tests (V1):
Face Swap - Qwen Image Edit 2509 (English)
Another important thing is to update ComfyUI. Many people are having terrible results because they haven't updated ComfyUI. The 2511 model has an architecture with a few more layers, and that's why ComfyUI needs to be updated.
About Flux 2:
I've done my best so far, but the results aren't as good as with QWEN. The base Flux 2 model can already handle head swapping, but with some difficulties. The goal of this LoRa was to try and improve that a bit, but I haven't achieved very good results. It might be a configuration issue, so here's this beta version for you to test.
Try with CFG: 8.0
PERSONAL NOTES:
The swap quality will always depend heavily on the quality of your input images. Larger, clean images with little noise or compression artifacts generally produce the best results. Keep in mind that the model always follows the quality of the body image, since it becomes the final rendered frame—so even if the face source is high-quality, a low-resolution or noisy body image will limit the outcome.
Most of the images I generate are created without using the LightX2V lighting LoRA, since I noticed that enabling it tends to make the skin appear more plastic-like and reddish, and finding the right balance requires extra tuning that I didn’t focus on. If anyone has discovered good configurations, feel free to share them in the comments of this template.
In short, using LightX2V makes the model less versatile because it operates with a fixed CFG value of 1.0. So before assuming it “didn’t work,” I recommend first testing the workflow I published without LightX2V to compare the results.
If you’re getting results with too much contrast, overly strong colors, or plastic-like textures while using LightX2V’s lightning models, try reducing the number of inference steps. For example, if you’re using the Qwen Image Edit 2509 Lightning (8 steps) model, try running it with 4 steps instead. The excessive contrast often comes from running too many steps while CFG remains fixed at 1.0.
If you encounter similar issues without using the lighting LoRA, try lowering the steps as well—e.g., from 20 down to around 16 or fewer—and reduce CFG to values like 1.2 or 1.5, which can help produce smoother, more natural results.
Another important detail: in images where the body is positioned farther from the camera, the face region becomes smaller, which can reduce swap accuracy and overall quality. This happens because the model has less pixel information to work with in that small facial area. To handle these cases, you can use my older workflow, which automatically crops the face region from the body image and performs an inpainting-like process to improve results in distant or small-face compositions.
Finally, if you notice loss of similarity between faces or poses—especially when the reference and target images differ significantly in aesthetics or angles—try increasing the strength of your head swap LoRA slightly (for instance, to 1.2 or 1.3) to restore consistency.
⚙️ BFS — “Focus Faces”
Trained on 240 image triplets (face, body, and result),
with a LoRA rank of 16 → later increased to 32,
and gradient accumulation = 2, running for 5500 steps on an NVIDIA L40S GPU.
This version produces stable and detailed face swaps, preserving expression, lighting, and gaze direction while maintaining the body’s natural look.
🔧 Model Notes
You don't need to use my workflow to make this lora work, if you are having problems with it use yours, it is the simple workflow of qwen image edit + lora and the inputs in the right order: face image 1, body image 2.
Quantization: not guaranteed to work below FP8 (avoid GGUF Q4).
Face mask: optional — remove if MediaPipe or Planar Overlay cause issues.
Pose conditioning: use MediaPipe Face Mesh or DWPose if you need more alignment control.
Lightning LoRA: may produce plastic-like skin, especially when mixed with other Qwen-based LoRAs.
⚙️ Recommended Settings
Samplers:
er_sde + beta57 / kl_optimal / ddim_uniform(best results)ddim + ddim_uniform (sometimes most realistic)res_2s + beta57
Don't get attached to one setting, sometimes if it doesn't work well with one, switch to another.
Precision:
🧠 Best:
fp16⚙️ Recommended:
gguf q8orfp8⚠️ Below fp8: noticeable degradation
Inference Tips:
With Qwen Image Edit 2509 Lightining LoRA → use 4 / 8 steps for fast generation.
Without it → use 12–20 steps, CFG 1.0–2.5 for realism.
🧬 BFS — “Focus Head”
The “Focus Head” version was trained as a continuation of Focus Face, extending the dataset and shifting focus toward full head swaps.
It was trained on a NVIDIA RTX 6000 PRO, rank 32, for 12,000 steps, using 628 image pairs (face, body, target, and sometimes pose maps generated via MediaPipe).
🔹 Training Phases
Standard Face Swap – same Focus Face, focusing on facial identity.
Pose-Conditioned Face Swap – added pose maps to align gaze and head angle.
Full Head Swap – replaced the entire head (including hair) for stronger identity control.
After ~2000 steps, the focus moved toward head swap refinement.
At ~4000 steps, the dataset was narrowed to perfect skin-tone matches, and by the end of training,
the dataset evolved from 628 → 138 → 76 high-quality samples for final fine-tuning.
⚠️ Note:
While Focus Face can still perform standard face swaps, it’s more naturally inclined toward full head swaps due to its data balance.
This was intentional in part, but also a side-effect of dataset distribution and mixed conditioning.
⚠️ Important Notice
Do not share results involving real people, celebrities, or public figures.
Civitai’s moderation may disable posts that violate likeness or consent rules.
This model is intended only for artistic and fictional characters, educational use, and AI experimentation.
I take no responsibility for any misuse of this model. Please use it responsibly and respect all likeness rights.
Description
🎬 BFS — Video Head Swap (Bernini-R / Wan 2.2)
A separate, video branch of the BFS series — it's a LoRA trained on top of Bernini-R (the ByteDance Bernini renderer, built on Wan2.2-T2V-A14B). It does head/face swap on video
clips while keeping the body, motion, clothing and scene of the source video.
How it works (why it's not a "normal" LoRA)
Bernini is a reference-conditioned (in-context) model. There are no new layers, but the training is not a standard character LoRA — for every step the pipeline:
- VAE-encodes the guide video and the head reference image as clean latents (same Wan latent space as the target),
- concatenates them into the same self-attention sequence as the noisy target, each tagged with its source_id RoPE phase (target = 0, guide = 1, reference = 2),
- and computes the flow-matching loss only on the target tokens.
Because a normal trainer (e.g. the standard AI-Toolkit Wan path) just noises a single clip, it can't teach this. I wrote a custom training script for it — open source here:
⚙️ Training details
- Base: Bernini-R (Wan2.2-T2V-A14B), dual-expert MoE (high-noise + low-noise) → the LoRA is a high/low pair, each applied to its respective expert.
- Config: rank 64 / alpha 64, lr 1e-4, 73 frames, 640 px long edge, single-expert GPU offload (only the routed expert stays on GPU so high frame counts fit one card).
- Dataset: ~538 head-swap triplets (target video, guide video, head-reference image). Captions are intentionally minimal — most are just the trigger head_swap:, so identity comes from the reference + in-context mechanism, not the text.
- Steps: best results around 2,500–3,000 (ArcFace identity similarity plateaus ~0.57; FM loss is flat, ArcFace is the real signal).
🔧 Usage (ComfyUI)
- Base must be Bernini-R (high + low), not vanilla Wan2.2 — the LoRA is a delta on Bernini's in-context fine-tune.
- Apply ..._high_noise to the high-noise model and ..._low_noise to the low-noise model.
- rank 64 / alpha 64 → scaling 1.0; start at strength 0.7–1.0.
- Use the Head-Swap Bernini Conditioning node (in my BFS node pack): guide_video → source_id 1 (kept), head_image → source_id 2 (identity).
- Trigger: head_swap: (alone is enough; an optional FACE/ACTION description can follow).
🧠 Personal notes
- Crop the reference to head/shoulders only — a full-body reference makes the body bleed into the result.
- Output follows the guide video (it becomes the rendered frame), so a clean, well-sized guide gives the best result.
- Stay near the trained resolution/frames (≈640 / 73f) — pushing far beyond is where temporal consistency starts to break.