Qwen 360 Diffusion
General
Qwen 360 Diffusion is a rank 128 LoRA built on top of a 20B parameter MMDiT (Multimodal Diffusion Transformer) model, designed to generate 360 degree equirectangular projection images from text descriptions.
The model was trained from the Qwen Image model on an extremely diverse dataset composed of tens of thousands of equirectangular images, depicting landscapes, interiors, humans, animals, and objects. All images were resized to 2048x1024 before training.
The model was also trained with a diverse dataset of normal photos for regularization, making the model a realism finetune when prompted correctly.
Based on extensive testing, the model's capabilities vastly exceed all other currently available T2I 360 image generation models. Thus when given the right prompt, the model should be capable of producing almost anything you want.
The model is designed to be capable of producing equirectangular images that can be used for non-VR purposes such as general imagery, photography, artwork, architecture, portraiture, and many other concepts.
Training Details
The training dataset consists of 32k unique 360 degree equirectangular images. Each image was randomly rotated horizontally 3 times for data augmentation (original + 3 rotations), providing a total of 128k training images. All 32k original 360 images were manually checked by humans for seams, polar artifacts, incorrect distortions, and other problems before their inclusion in the dataset.
For regularization, 64k images were randomly selected from the pexels-568k-internvl2 dataset and added to the training set.
Training timeline: 3 months and 23 days
Training was first performed using nf4 quantization for 32 epochs (8 epochs counting the original + augmentations as a single epoch):
qwen-360-diffusion-int4-bf16-v1.safetensorswas trained for 28 epochs (1,344,000 steps)qwen-360-diffusion-int4-bf16-v1-b.safetensorswas trained for 32 epochs (1,536,000 steps)
Training then continued at int8 quantization for another 16 epochs (4 epochs counting the original + augmentations as a single epoch):
qwen-360-diffusion-int8-bf16-v1.safetensorswas trained for a total of 48 epochs (2,304,000 steps)
Training then continued on Qwen/Qwen-Image-2512 at int8 quantization for another 14 epochs on (3.5 epochs counting the original + augmentations as a single epoch):
The total number of equirectangular images was increased to 35k, and regularization images were removed from the dataset.
qwen-360-diffusion-2512-int8-bf16-v2.safetensorswas trained for 62 epochs (2,794,000 steps)
Usage
To activate panoramic generation, include one of the following trigger phrases or some variation of one or more of the following trigger words in your prompt:
"equirectangular", "360 image", "360 panorama", "equirectangular image", "equirectangular 360 image", or "360 degree panorama with equirectangular projection"
Note that even using a 360 viewer on your 2D device screen can create a feeling like you are actually inside the scene, known as a sense of 'presence' in psychology.
Recommended Settings
Aspect ratio: For best results use the
2:1resolution of2048×1024. Using1024×512,1536×768, and other 2:1 ratios for text-to-image generation may cause the model to struggle with generating proper horizons.Prompt tips: Include desired medium or style, such as photograph, oil painting, illustration, or digital art.
360-specific considerations: Remember that 360 images wrap around with no borders—the left edge connects to the right edge, while the top and bottom edges merge into a single point at the poles of the sphere.
Human subject considerations: For full body shots, specify the head/face and footwear (e.g., "wearing boots") or lack thereof to avoid incomplete or incorrectly distorted outputs.
Equirectangular distortion: Outputs show increasing horizontal stretching as you move vertically away from the center. These distortions are not visible when viewed in a 360 viewer.
Once generated, you can upscale your panoramas for use as photographs, artwork, skyboxes, virtual environments, VR experiences, VR therapy, or 3D scene backgrounds—or as part of a text-to-image-to-video-to-3D-world pipeline. Note that the model is also designed to produce equirectangular images for non-VR usage as well.
Notes
FP8 inference
When using FP8 quantization, for maximum visual fidelity it's strongly recommended to use the GGUF Q8 or int8 quantized versions of Qwen Image transformer models.
If you are using transformer models with fp8_e4m3fn or fp8_e5m2 precision, or low precision models trained with "accuracy-fixing" methods (e.g., ostris/ai-toolkit), they may cause patch or grid artifacts when used with the int8-trained LoRA model. Some have found this issue to be caused by directly downcasting to fp8 from fp16, without proper scaling and calibration. → To avoid this, use the lower-accuracy full-precision versions of the model:qwen-360-diffusion-int4-bf16-v1.safetensors or qwen-360-diffusion-int4-bf16-v1-b.safetensors.
Low-Precision Artifact Mitigation
If artifacts still appear when using the int4-trained LoRA on afp8_e4m3fnorfp8_e5m2transformer quant, they can often be reduced by:Adjusting the LoRA weight, and/or refining both positive and negative prompts.
Additional Tools
HTML 360 Viewer
To make the viewing and sharing of 360 images & video easier, I built a web browser based HTML 360 viewer that runs locally on your device. It works on desktop and mobile browsers, and has optional VR headset support.
You can try it out here on Github Pages: https://progamergov.github.io/html-360-viewer/
Github code: https://github.com/ProGamerGov/html-360-viewer
You can append '
?url=' followed by a link to your image in order to automatically load it into the 360 viewer, making sharing your 360 creations extremely easy.
Recommended ComfyUI Nodes
If you are a user of ComfyUI, then these sets of nodes can be useful for working with 360 images & videos.
ComfyUI_preview360panorama
For viewing 360s inside of ComfyUI (may be slower than my web browser viewer).
Link: https://github.com/ProGamerGov/ComfyUI_preview360panorama
ComfyUI_pytorch360convert
For editing 360s, seam fixing, view rotation, and masking potential artifacts.
Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert
ComfyUI_pytorch360convert_video
For generating sweep videos that rotate around the scene.
Link: https://github.com/ProGamerGov/ComfyUI_pytorch360convert_video
Alternatively you can use a simple python script to generate 360 sweeps: https://huggingface.co/ProGamerGov/qwen-360-diffusion/blob/main/create_360_sweep_frames.py
For those using diffusers and other libraries, you can make use of the pytorch360convert library when working with 360 media.
Limitations
A large portion of training data has the viewer at 90 degrees to the direction of gravity, and thus rotating outputs may be required to achieve different vertical angles.
Contributors
Citation Information
BibTeX
@software{Egan_Qwen_360_Diffusion_2025,
author = {Egan, Ben and {XWAVE} and {Jimmy Carter}},
license = {MIT},
month = dec,
title = {{Qwen 360 Diffusion}},
url = {https://huggingface.co/ProGamerGov/qwen-360-diffusion},
year = {2025}
}
APA
Egan, B., XWAVE, & Jimmy Carter. (2025). Qwen 360 Diffusion [Computer software]. https://huggingface.co/ProGamerGov/qwen-360-diffusion
Please refer to the CITATION.cff for more information on how to cite this model.
This model can also be found on HuggingFace: https://huggingface.co/ProGamerGov/qwen-360-diffusion
Description
FAQ
Comments (25)
Really? 360Mb 😂
Here we go again
Nobody got the joke and they downvoted you...
@diogod Ikr? 🤡
lol
wow this is trippy af, i put some of your images in wan 2.2 i2v and it looks crazy. Some distortion but dang. what about 180, would that need a separate lora?
@markdalias I'm glad to hear that you like the model! My 360 custom nodes for ComfyUI have a 'Crop 360 to 180 Equirectangular' node that you can use for that if you want crop 360s into 180 degree images: https://github.com/ProGamerGov/ComfyUI_pytorch360convert
@progamergov got it so maybe qwen 360 -> crop 180 -> wan i2v. Just want to conserve as many pixels for video rendering as possible
any way to animate 360 nsfw?
@davidelsenor69 put your image into wan, and then use my web viewer to view the resulting video: https://progamergov.github.io/html-360-viewer/
There is a LoRA for LTX-2 that handles 360 video from image.. you can run the result from it through a video to side by side 3d in comfyui.. works hit and miss...
There’s still a visible line, so it’s not seamless yet. How can I fix this?
@MihawkJr Use the seam removal workflow I embedded in this image (note that you may have to play around with the seam mask settings depending on your image): https://civitai.com/images/113736462, using my custom node set: https://github.com/ProGamerGov/ComfyUI_pytorch360convert
@MihawkJr Alternatively, you can drag and drop this image and use its workflow for seam fixing: https://civitai.com/images/113736462
U can use this as a base just change it for qwen...
https://civitai.com/models/1730751?modelVersionId=1958775
"First of all, thank you for your hard work. Your work is amazing! However, I’ve noticed a small issue: when generating characters, they tend to be extremely large and positioned very close to the camera, which causes severe distortion. Would it be possible to adjust the training data so that characters or objects appear at a more appropriate distance and position? Thank you!"
@8888 Thank you! We have been continuously adding to our dataset to fill in gaps identified during the training and testing of the model. So future models should be better at rendering humans for example. In the meantime, I would suggest using "giantess", "macrophilia", "closeup", and similar terminology in your negative prompts if you are experiencing issues with the size and closeness of subjects.
on v1 I still get the grid image problem even when running the model at bf16 on comfyui... I know you recommend the int-b version in these cases, and it does work, but why is v1 not working? Is the v1 lora not compatible with comfyui?
@diogod The int8 version of the LoRA should work without any grid issues on the bf16 version of the model. The issue is with the models themselves, not ComfyUI (which works with all versions). Some Qwen-Image model quants (like the FP8 quant made by the ComfyUI team) were incorrectly quantized resulting in less precision, and thus training (finetune, lora, etc...) with those incorrect models optimizes outputs to towards matching precision errors. Correctly quantized models have fewer and slightly different precision errors that training on them optimizes outputs towards. The grid problem arises when you mismatch models trained towards different precision error sets (you get even more errors).
Hi. Is it normal for the first time to take 6 minutes to get the image with an Nvidia GeForce RTX 4090 and 24GB of VRAM?
@traviesomorbo69826 That would depend on your workflow. But in my experience, a 4090 should take closer to 2 minutes and about 24 seconds when using the GGUF base model.
@progamergov Hi. This is your workflow, the one "Rain-slicked autumn highway..." with the Qwen_Image-Q8_0.gguf. I haven't tried it a second time; the first time lasts 6 minutes.
@traviesomorbo69826 Is it the workflow with seam repair? Or without it?
@progamergov Hi. Excuse me. I dont know which is the workflow with seam repair
@traviesomorbo69826 I shared a seam fixing workflow in the second Rain-slicked autumn highway image I posted here: https://civitai.com/images/113736462, using my custom 360 nodes from here: https://github.com/ProGamerGov/ComfyUI_pytorch360convert. If you were using that workflow to in-paint the seam, then the workflow could take closer to 5-6 minutes to run as Qwen-Image is a large model. I also haven't done anything to optimize the speed of my workflows, as I was trying to aim for maximum quality.







