Full Checkpoint with improved TE do not load additional CLIP/TE
SD3.5 Large with FLAN improved TE
The Full BF16 model runs at an amazing speed even on my 8GB card. It was built with Triple Clip using the 42GB Google FLAN T5xxl 12B parameter model (Converted to BF16), CLIP-G and improved CLIP-L
The Full FP16 model runs at half the speed of the BF16 version, on my card but may have better accuracy.
Do not use negatives above 0.2 timestamp - If you do not understand this line load any image as a workflow. (The same instructions as base SD 3.5)
FP16 Hybrid model and LARGE FP8 model have standard T5xxl I consider the Full BF16/FP16 models to surpass them in every way but am leaving them up for now.
If you have a 8GB card I suggest the Medium Model with FLAN it is still about several times faster then the BF16 FLAN model on my RTX 3050 (1.5 seconds per IT vs 5-6 Seconds Per IT for the 26GB model)
Works in Comfy-UI without any modification just load checkpoint and go.
Per the Apache 2.0 license FLAN is attributed to Google
My IT's per Second on an old 3050 8GB RTX
SD 3.5 Large (Triple CLIP FP8)
13.5GB = 6-8 Seconds Per IT
22GB Hybrid = 6-8 Seconds Per IT
26GB (BF16 FULL) = 5-6 Seconds per IT (BF16 seems to be faster for 1bit less precision but wider range I think it is worth it)
26GB (FP16 FULL) = 8-16 Seconds per IT (FP16 seem to have erratic IT/s compared to BF16)
Description
FAQ
Comments (9)
Please upload the FLAN as safetensor file (non baked)
just get OOM with 32gb ram and 24gb vram
My 3060 waves in silence...
Does anyone have a working workflow for this that can be used in swarmui? I've tried loading this with comfy 3.5 workflows loaded, and I keep getting the same thing: no backend can load this model...
SD3.5 is like SDXL in generating text on pictures but here there are sometimes problems with letters. You need to create a spaghetti prompt full of "masterpiece, realistic, 8k quality" bullshit to achieve some result. And of course, don't forget about negative prompts. There are still problems with hands. Faces sometimes look like they were created with SD1.5 and need detailer. Is it worth keeping the 25GB model on disk? Hell no. SDXL makes the same quality pictures and takes less disk space. If you want to generate text on pictures use Flux.
Still a bit wonky on hands and feet but I was able to get some interesting images with it. For such a large model it didn't seem to take any longer to load. Rendering on my RTX3090 took around 45 seconds or so for a 1024x1024 image. See the nude swimming pic below.
Can i give prompt more than 77 tokens?
Keep getting random animal ears on people. Once I got a horse head on a woman.
Fa abbastanza cacare.