CivArchive

    The BFS (Best Face Swap) LoRA series was developed for Qwen Image Edit 2509, specialized in high-fidelity face and head replacement tasks with natural tone blending and consistent lighting.

    Each version builds upon the previous one:

    • đź§  Focus Faces: precise face swaps, keeping the original head shape and hair while transferring facial identity and expression.

    • đź§© Focus Head: stronger head swaps, replacing the full head (including hair and pose orientation).

    • The 2 versions complement each other, one is focused on face swapping and the other is focused on head swapping.

    Share your creations that do not involve public figures or individuals who have not given consent. By sharing, you will earn Buzz, and your posts directly help me improve future versions by identifying and correcting potential issues.

    Important Note: If you are going to use Qwen Image Edit 2511, update your comfyui before anything else, because without it you may have problems with completely distorted or ugly images.

    If this model was helpful to you in any way, please consider helping me continue creating more model for the price of a coffee.

    Workflows:
    Head/Face Swap Workflow - Qwen-Image-Edit-2509 | Civitai

    My Custom Lightning LoRA:

    Custom Lightning - Qwen Image Edit - 2511 | Qwen LoRA | Civitai

    Alissonerdx/CustomLightning · Hugging Face

    Test V3 here:

    BFS Best Face Swap - a Hugging Face Space by Alissonerdx

    Face Swap Video Tests (V1):
    Face Swap - Qwen Image Edit 2509 (English)

    Another important thing is to update ComfyUI. Many people are having terrible results because they haven't updated ComfyUI. The 2511 model has an architecture with a few more layers, and that's why ComfyUI needs to be updated.

    About Flux 2:

    I've done my best so far, but the results aren't as good as with QWEN. The base Flux 2 model can already handle head swapping, but with some difficulties. The goal of this LoRa was to try and improve that a bit, but I haven't achieved very good results. It might be a configuration issue, so here's this beta version for you to test.

    Try with CFG: 8.0

    PERSONAL NOTES:

    The swap quality will always depend heavily on the quality of your input images. Larger, clean images with little noise or compression artifacts generally produce the best results. Keep in mind that the model always follows the quality of the body image, since it becomes the final rendered frame—so even if the face source is high-quality, a low-resolution or noisy body image will limit the outcome.

    Most of the images I generate are created without using the LightX2V lighting LoRA, since I noticed that enabling it tends to make the skin appear more plastic-like and reddish, and finding the right balance requires extra tuning that I didn’t focus on. If anyone has discovered good configurations, feel free to share them in the comments of this template.

    In short, using LightX2V makes the model less versatile because it operates with a fixed CFG value of 1.0. So before assuming it “didn’t work,” I recommend first testing the workflow I published without LightX2V to compare the results.

    If you’re getting results with too much contrast, overly strong colors, or plastic-like textures while using LightX2V’s lightning models, try reducing the number of inference steps. For example, if you’re using the Qwen Image Edit 2509 Lightning (8 steps) model, try running it with 4 steps instead. The excessive contrast often comes from running too many steps while CFG remains fixed at 1.0.

    If you encounter similar issues without using the lighting LoRA, try lowering the steps as well—e.g., from 20 down to around 16 or fewer—and reduce CFG to values like 1.2 or 1.5, which can help produce smoother, more natural results.

    Another important detail: in images where the body is positioned farther from the camera, the face region becomes smaller, which can reduce swap accuracy and overall quality. This happens because the model has less pixel information to work with in that small facial area. To handle these cases, you can use my older workflow, which automatically crops the face region from the body image and performs an inpainting-like process to improve results in distant or small-face compositions.

    Finally, if you notice loss of similarity between faces or poses—especially when the reference and target images differ significantly in aesthetics or angles—try increasing the strength of your head swap LoRA slightly (for instance, to 1.2 or 1.3) to restore consistency.


    ⚙️ BFS — “Focus Faces”

    Trained on 240 image triplets (face, body, and result),
    with a LoRA rank of 16 → later increased to 32,
    and gradient accumulation = 2, running for 5500 steps on an NVIDIA L40S GPU.

    This version produces stable and detailed face swaps, preserving expression, lighting, and gaze direction while maintaining the body’s natural look.


    đź”§ Model Notes

    • You don't need to use my workflow to make this lora work, if you are having problems with it use yours, it is the simple workflow of qwen image edit + lora and the inputs in the right order: face image 1, body image 2.

    • Quantization: not guaranteed to work below FP8 (avoid GGUF Q4).

    • Face mask: optional — remove if MediaPipe or Planar Overlay cause issues.

    • Pose conditioning: use MediaPipe Face Mesh or DWPose if you need more alignment control.

    • Lightning LoRA: may produce plastic-like skin, especially when mixed with other Qwen-based LoRAs.


    Samplers:

    • er_sde + beta57 / kl_optimal / ddim_uniform (best results)

    • ddim + ddim_uniform (sometimes most realistic)

    • res_2s + beta57

    Don't get attached to one setting, sometimes if it doesn't work well with one, switch to another.

    Precision:

    • đź§  Best: fp16

    • ⚙️ Recommended: gguf q8 or fp8

    • ⚠️ Below fp8: noticeable degradation

    Inference Tips:

    • With Qwen Image Edit 2509 Lightining LoRA → use 4 / 8 steps for fast generation.

    • Without it → use 12–20 steps, CFG 1.0–2.5 for realism.


    🧬 BFS — “Focus Head”

    The “Focus Head” version was trained as a continuation of Focus Face, extending the dataset and shifting focus toward full head swaps.

    It was trained on a NVIDIA RTX 6000 PRO, rank 32, for 12,000 steps, using 628 image pairs (face, body, target, and sometimes pose maps generated via MediaPipe).

    🔹 Training Phases

    1. Standard Face Swap – same Focus Face, focusing on facial identity.

    2. Pose-Conditioned Face Swap – added pose maps to align gaze and head angle.

    3. Full Head Swap – replaced the entire head (including hair) for stronger identity control.

    After ~2000 steps, the focus moved toward head swap refinement.
    At ~4000 steps, the dataset was narrowed to perfect skin-tone matches, and by the end of training,
    the dataset evolved from 628 → 138 → 76 high-quality samples for final fine-tuning.

    ⚠️ Note:
    While Focus Face can still perform standard face swaps, it’s more naturally inclined toward full head swaps due to its data balance.
    This was intentional in part, but also a side-effect of dataset distribution and mixed conditioning.


    ⚠️ Important Notice

    Do not share results involving real people, celebrities, or public figures.
    Civitai’s moderation may disable posts that violate likeness or consent rules.
    This model is intended only for artistic and fictional characters, educational use, and AI experimentation.

    I take no responsibility for any misuse of this model. Please use it responsibly and respect all likeness rights.

    Description

    After the first release, I completely redesigned the conditioning logic and identity isolation strategy. The main focus of this iteration was eliminating identity leakage and improving robustness under motion.

    Unlike V1, this version does not rely on propagating the guide video’s facial micro-movements. The guide face is now fully masked during generation to prevent identity contamination.

    This required retraining with a significantly larger dataset and new masking strategies.

    Dataset Specifications

    • Data: 800+ high-quality head swap video pairs

    • Resolution: Trained at 768 (previously 512)

    • Recommended Inference Resolution: 768

    • Aspect Ratio: Primarily Landscape (horizontal videos still perform best)

    • Framing: Optimized for Close-ups

    The higher training resolution improves hair stability, identity retention, and structural consistency.

    Conditioning Methods (New in V2)

    This version supports multiple ways to inject identity using Frame 0:

    • Direct Photo Conditioning:
      Works well, but the model must internally reconcile pose, lighting, and depth differences. In some cases, this makes the model “fight” to integrate the identity properly.

    • First-Frame Head Swap (Still Extremely Strong):
      Applying a proper head swap to Frame 0 continues to produce incredible results.
      Because the structure is already correct (pose, lighting, occlusion), the model tends to suffer less and propagate identity more cleanly compared to static photo conditioning.

    • Automatic Magazine-Style Overlay:
      The new face is automatically positioned over the guide face based on mask alignment.

    • Manual Overlay:
      Advanced users may manually composite the new face over Frame 0 before inference.

    đź”´ Critical: Mask Quality Is Everything

    In this version, mask quality is the single most important factor.

    Everything depends on the mask.

    • Absolutely no detail from the original guide face can leak

    • No small holes or transparency artifacts

    • Full coverage of skin, eyebrows, facial hair, and hairline when needed

    If any portion of the old identity remains visible, the model may reintroduce it during generation.

    Mask precision directly determines:

    • Identity stability

    • Leakage prevention

    • Deformation resistance

    • Overall realism

    Take your time refining the mask. Increasing LoRA strength will not fix a bad mask.

    Mask Types

    You can alternate between:

    • Square Masks (Recommended in most cases):

      • Usually produce better results

      • Provide more spatial context

      • More stable identity

      • May generate slightly larger heads due to padding

    • Tight / Adjusted Masks:

      • More natural proportions

      • Higher deformation risk if head shapes differ

      • Sensitive to long-hair mismatches

    In most scenarios, square masks tend to produce stronger and more stable results.

    Inference & Tuning Tips

    • Resolution: 768 is ideal.

    • First Pass vs Second Pass:
      You can run a single pass at 768 (recommended), or perform downscale + second pass.

      ⚠️ Note: In some cases, the second pass may slightly alter identity from the first pass.

    • Trigger:

      head swap
      
    • Prompting:
      Prompts still have minimal impact. The model was trained with a single trigger and does not rely heavily on scene descriptions.

    Advanced Technique

    You can experiment with combining this LoRA with the native LTX-2 inpainting workflow.

    Remember that version 1 is still very good for transferring expressions and movements because in that version the face is visible in the video guide. You can try combining versions 1 and 2; do your tests.

    Differences From V1

    V1:

    • 300 pairs

    • 512 training resolution

    • Guide facial motion preserved

    • Higher identity leakage risk

    V2:

    • 800+ pairs

    • 768 training resolution

    • Guide face fully masked

    • Stronger identity isolation

    • Significantly reduced leakage

    Future & Support

    I am continuing experimentation with further conditioning improvements and even stronger identity-only workflows.

    Maintaining R&D and renting Blackwell GPUs is extremely expensive. If you find this LoRA useful, please consider supporting development.

    👉 Donate here: https://buymeacoffee.com/nrdx

    Workflow link:
    https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video