Required Files (v3)
To use the workflows, you need to have the following models installed in ComfyUI.
Place them in the corresponding directories as specified below.
<your_comfyui_dir> / models / checkpoints :
pixart_sigma-FP16.safetensors (1.2GB)
photon_refiner-FP16.safetensors (2.1GB)
<your_comfyui_dir> / models / clip :
t5_xxl_encoder-FP8.safetensors (4.9GB)
<your_comfyui_dir> / models / vae :
pixart_vae.safetensors (0.1GB)
Required Nodes (v3)
IMPORTANT: Ensure that your ComfyUI is updated to the latest version.
Additionally, the workflows require the following custom nodes to be installed:
ComfyUI_ExtraModels: provides support for PixArt-Sigma.
ComfyUI-Crystools: used for some simple string operations.
-- under construction --
Version History
v3
- Models in FP16 & FP8 for min VRAM usage (~6GB).
- Unified PixArt+refiner prompts (no more two separate prompts).
- 1536 x 1024 native resolution.
- Option to select portrait/landscape image orientation.
- Option to choose between fast/quality samplers.
- Option for the refiner to ignore the prompt.
- Variations achieved by changing the refiner seed.
- Six preconfigured workflows.
- CivitAI prompt extractor compatibility.
v2
- Simplified steps configuration.
- Added nodes to easily modify CFG and SEED parameters.
- Aesthetic node color/position changes for better usability.
- Updated install instructions with corrections and clarity.
- Improved final image organization into date-based folders.
v1
- The initial release providing core functionality.Description
Steps configuration simplified.
Nodes added for easy modification of CFG and SEED parameters.
Aesthetic changes to node colors and positions for improved usability.
Installation instructions updated with corrections and improved clarity.
Improved final image organization with date-based folders.
FAQ
Comments (26)
Wonderful. How do you randomize the seed for each generation? Mine is stuck on 2
In the "Prompt & Parameters" group there's a "Seed" box. In it, change "fixed" to "randomize".
it the t5 thing similar to ella? are they both language type things that modify embeddings
It's a bit late but the T5 encoder has recently become more popular because it's used in several models like FLUX and SD3. The T5 encoder is part of a series of old Large Language Models (LLMs) developed by Google, which consist of an encoder-decoder transformer. The encoder processes the input text, and the decoder generates the output text.
You can think of the T5 encoder as a "CLIP on steroids." It excels at establishing relationships between words, which helps the image generation model better understand the concepts users write in their prompts. Unlike ELLA with SDXL, models like PixArt and FLUX are natively trained with it, making them more responsive and accurate.
this is a gift! thank you for being so detailed. i am up and running thanks to your help
I'm glad you found it helpful! I'm working on optimizing the workflow to take full advantage of PixArt-Sigma's speed. My goal is to get image generation down to a few seconds per image.
waiting to see how this will work with controlnet
Yes, me too! FLUX has ControlNet which was supposedly impossible to train, and PixArt-Sigma is definitely missing something like that.
This workflow requires a supercomputer, or something? I have an RTX 3080 Ti GPU. Loading just a T5 model takes an eternity, and then the same eternity for a preparation before generation. And it works only for one generation. At second generation just en error message pops out, so I must restart the UI to generate another image.
This workflow runs like a charm in my RTX4070.
@PeterQ30 Is that T5 using VRAM or RAM? Cuz when T5 was loading up, there was created some sort of a big-ass sized paging file on my SSD over the top of completely filled up 16 gig RAM. This thing requires at least 32 gigs of free RAM, or something?
Same. Loading the T5 model took about 1 hour and 20 minutes for me (RTX 2070 Super). In comparison, loading the T5 for Stable Diffusion 3 takes about 5 minutes.
Same issue here on a Ryzen 5900x 32GB Ram and RTX 4070. Problem is caused by the text encoder havent figured out yet the reason. Therefore I have used the T5 te from SD3 and it worked.
UPDATE: Problem fixed: T5 loader has to be set CPU and fp32!
Same here, Long T5 write time on disk, before generation. I have a 3080 and 32GB of Ram. And the fix indicated here is not useful since the wokflow is already fixed on CPU and fp32
I understand the frustration with the long T5 processing times. You're right, the T5 encoder is a large 20GB file, and it was from a time when T5 as a text encoder was still quite new, and I didn't have much experience with it. However, I'm close to releasing version 3, which, among other improvements, addresses this issue. It uses the latest updates in ComfyUI and model quantization, resulting in significantly faster performance, especially for setups with limited RAM and VRAM. Thanks for your patience!
For some reason, Huggingface renamed these 2 files, which need to be re-renamed:text_encoder_config.jsontext_encoder_model.safetensors.index.json
Yes, I know! That's super frustrating. I'm currently working on version 3 of the Abominable Workflow, which will address this by uploading my own files to Huggingface. I'm also updating everything to the latest ConfyUI and quantization features, which will make it run significantly faster, especially for setups with limited RAM and VRAM. Thanks for reporting it! I hope the next version will be much smoother for you.
Wow looks impressive! It's almost like if you would generate pictures first, and then write prompt to match it :) Can you show example of two persons with different properties? Like, one is sad while other is happy or something like that.
It's like FLUX but 20 times smaller! However, when concepts are very similar, they can blend together a bit. I'm currently in the process of uploading version 3, which incorporates some of the new features implemented for FLUX into PixArt-Sigma. I'll see if I can generate something with different emotions, If it turns out well, I'll add it to the gallery when I update the version 👍
thanks for your workflow ! would be awesome to optimize the result with an upscaler ( ultimate upscaler or a double ksampler )
Thank you, I wanted to try out Sigma and this was the perfect workflow, it works perfectly.
I must say Sigma itself is less impressive than I expected, it still struggles a lot with articulated prompts.
Question: your current workflow utilizes the text encoder and the VAE from the T5 model (the hardest one to run). Am I wrong, or if I were to utilize the 1024 version of the model, wouldn't it be better to use the vae and text encoder from the 1024 version instead?
It works pretty good! Thanks. I just finetuned a pixart sigma model and this allowed me to test it properly. Pretty good results... I just have to train it further now :)
Amazing workflow. Thanks!
I would change in manual from:
cd <your-comfyui-directory>
cd custom_nodes
git clone https://github.com/city96/ComfyUI_ExtraModels
pip install -r requirements.txt
to:
cd <your-comfyui-directory>
cd custom_nodes
git clone https://github.com/city96/ComfyUI_ExtraModels
cd ComfyUI_ExtraModels
pip install -r requirements.txt
For anyone haveing only 12GB VRAM and 32GB RAM, I suggest to dowload following file (from SD3): t5xxl_fp16.safetensors
Link (registration required):
https://huggingface.co/stabilityai/stable-diffusion-3-medium/tree/main/text_encoders
Changes in T5 loader:
t5v11_name: t5xxlfp16.safetensors
t5v11_ver: xxl
path_type: folder
device: auto
dtype: FP16
Great work! Is it possible to add tiled KSample and facedetailer for more detail?
Can't get the workflow to work with PixArt-Sigma-XL-2-1024-MS.pth (getting error about an invalid "<" key), but this does work with other models, for example Pixart 900M-base.safetensors.



















