ThinkDiffusionXL (TDXL)
ThinkDiffusionXL is the result of our goal to build a go-to model capable of amazing photorealism that's also versatile enough to generate high-quality images across a variety of styles and subjects without needing to be a prompting genius.
You can find it preloaded on ThinkDiffusion.
Read more about the model, click here
Please leave a review if you're happy with it, this will encourage us to create more and improve on it.
The work
Data source: TDXL is trained on over 10,000 diverse images that span photorealism, digital art, anime, and more. The smallest resolution in our dataset is 1365x2048, but many images go up to resolutions as high as 4622x6753. In total, our dataset takes up 42GB.
Training: With 1.8 million steps, we’ve put in the work. For comparison, Juggernaut is at 600k steps and RealVisXL is at 348k steps
Hand-captioned images: Each image is carefully captioned by hand, enhancing the model's ability to generate accurate and high-quality results from minimal prompts.
NSFW capabilities: The model includes over 1,000 tastefully curated NSFW images.
Our thoughts
Detail and quality: Most XL models in the Realistic category suffer from poor detail, especially in the background and even in basic features like eyes, teeth, and skin. We believe TDXL outperforms in these areas due to its large, high-quality dataset. For comparison, Juggernaut has about half the image material, and RealVisXL has only 1,700 images. Ultimately, TDXL simply possesses much more "knowledge".
Less-Bias: We made sure to use an equal number of images for each style, gender, etc. Other models we tested over the past few months had some kind of bias, sometimes it was bias toward portrait shots, gender bias, certain ethnicities, etc. For instance, Juggernaut has a bias in the Close-Up area, and the Cinematic Light is quite dominant in that model. RealVisXL also has a bias towards Portrait shots. On the other hand, TDXL gives you what you want: Landscape, Midshot, Full Body, Close-Up, Portrait, Sideview, Backview, Action Shots, Cinematic...whatever you want without always being pushed in a certain direction due to a bias.
Versatile base: Because of its large balanced quality dataset, TDXL is versatile to serve as a base model for future trainings. You can create new finetunes in entirely different directions, add LoRAs to fill in missing concepts, or do additional trainings with more balanced quality data.
Description
FAQ
Comments (73)
You hand captioned 10,000 images?
I think that's BS myself, unless they have outsourced it, I don't think whoever wrote that realises just how long it would take to hand caption 10,000 images. It would be like a full time job for 2 or 3 people over 3 or 4 weeks.
Yes. This model has been in the works for quite some time. We knew early on that it was of the utmost importance to focus on having the best dataset we could possibly attain. So we felt it was a worthwhile effort to invest in.
so good!
i love it.
Thank you for your hard work. The result is evidence of real quality work.
Thank you for your hard work, really appreciated
my new go to. dope guys 👊🏾🔥
Looks great. Do you have any recommendations for CFG, Samplers VAE etc? Do you recommend using the refiner for this model? Do you have any recommended comfy workflows?
Looks great. Do you have any recommendations for CFG, Samplers VAE etc? Do you recommend using the refiner for this model? Do you have any recommended comfy workflows?
I think everything just works with defaults for example
CFG: 5-10 ( i use 7 on default)
Sampler: DPM++ 2M Karras
VAE: Normal Stability VAE
Refiner: Not needed
With this excellent model, you can achieve excellent results without the use of lora or refiner.
Would you still recommend using <lora:sd_xl_offset_example-lora_1.0:0.3> ?
@kubilayan I use only this model and sdxl_vae.safetensors · stabilityai/sdxl-vae at main (huggingface.co) as vae for all of my generated images.
Any recommended settings?
From the creator on Reddit:
CFG: 5-10 ( I use 7 on default)
Sampler: DPM++ 2M Karras
VAE: Normal Stability VAE
Refiner: Not needed
Looks like this is my new fav model!!! Thanks for all the effort guys!
Anyone able to make this work with Animatediff?
no because it is an SDXL model. Animatediff doesn't work with SDXL yet as far as I know
use hotshotXL for SDXL models
@Cseti @black_jack_5223 I really appreciate your guys helpful comments! I'm a complete newbie to this so I'm sorry for sounding so dumb!
@mpr9348378 don't ever apologize for not knowing something. At some point, everyone came through this and asked for help before
Why does TDXL results are less detailed compare to RealisticStockPhoto_v10 or other fine-tuned models. What's the way to get results with high detailed? Adding "highly detailed" prompt is not very effective...
Friend, what models can you suggest to me for photorealism?
@studioffan408 canon dslr camera
You have to prompt TDXL a bit differently than other SDXL models, go look at the devs example pictures and the prompts they used, thats a good place to start anyway.
@ComradeMittens I looked in the devs example. What exactly do you mean by differently? Can you elaborate please?
@shubh The devs made this model using 10k hand captioned images, a lot of other SDXL models either use less images or they use machine captioning. In the case of machine captioning a tag approach to prompting tends to work better. But this isn't one of those models, so if you want more detail out of your images you should try using short descriptive sentences, and not too many, alongside very few descriptive tags like "short black hair" for the little details (compare the TDXL example prompts to the realvisXL ones to see this clearly). I also find that having just one quality tag like "best quality" in the positive prompt helps, but not as much as in other models. The last thing to keep in mind is that this model is trying to create real, imperfect, looking people, not supermodels, so unless you specify that, you wont be getting the same results as in other realistic models and if you do, you will be removing the detail in the skin especially. I hope this helps you!
@ComradeMittens That's super helpful. Thanks!!
You can try the embedding I trained for the sdxl model, which is particularly effective for models with excellent realistic performance.
You can try the embedding I trained for the sdxl model, which is particularly effective for models with excellent realistic performance.
https://www.seaart.ai/models/detail/9f0698666f0013d90b89cbc5d23f038a
can you claim it please?
it doesnt matter because generated images will carry the model signatures
@simartem07 wov! interesting ! .. also you say about marked images and detectable after generation ? so ? as even the containing artists names used and referenced ? May be far useful in a way.
@simartem07 the seaart pics doesn't carry that info
@eglor66 the point is, this is an open-source world, all models are variants of each other and nobody knows which rights may have been violated in training process of each model which are based on real-world generations of photo artists and all hand-made digital arts, where all the models are already variants of only few authentic originated unique model. If you give me any image created with any ai-model (including the HASH) , i will convert and embed any data you want it to carry. Generate me any image with MidJourney, i will change the embedded data into Dall-E, or remove all embeddings and put only EXIF info.. This is not something easily preventable in today's conditions.. Unfortunately..
Truly amazing. Needless to say I love playing around with this model and never stopped relying on it! Thank you for your hard work!
Mine works until last frame. When it finishes, the color burn. I've tried using a plugin "anti burn" but didn't fix it! I don't know how to fix it
and is a 1024x1024 model not a 512x512
what is your cfg value and are you using any loras? also use the SDXL vae if you are not already, you need it for inpainting/img2img with this model anyway.
Anyone has issue with inpainting and ADetailer? The masked areas become slightly desaturated for me. 😔
same thing here, it's very irritating, it's the only problem I have so far
I found this but i don't know
If you are not already, use the SDXL vae instead of the model's vae (most SDXL models do not have baked vae's for inpainting/img2img)
@ComradeMittens Damn~ Thanks a lot! This solves it!
@ivanbonefacic You can get the sdxl vae here https://huggingface.co/stabilityai/sdxl-vae/tree/main
Same for me. And I'm using the SDXL VEA.
Nice work Think diffusion, One of the top notch realistic SDXL model at the moment. Really appreciate your efforts.
I am Eric, I run a Gen AI startup leveraging SDXL Loras and want to suggest a collaborative research opportunities making 'Midjourney for human portraits'
Let's jump on a quick call/chat and I would cover detail + opportunities we could make together.
many credits, reputation and compensation is assured.
Find me on the contact below:
- Discord : eric_sdxl
- Email: [email protected]
"A tensor with all NaNs was produced in Unet" is what I get in img2img, no one else has this issue?
I was here to post something else assuming this is resolved by now. Answer is check your VAE settings
Not working with TensorRT? I get a
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument mat1
Hi, I'm not shure already but I think I had the same problem with another checkpoint. Try to change or deactivate the VAE, if you use one. Maybe this could help.
I'm dying to try It out, but It won't load.
It keeps loading back the previous model.
having the same issue. did you find a solve?
Same was happening for me once with another model and I ended up re-downloading the model as a fix.
That's a weird thing that happens sometimes in a1111. It never happened to me in comfyui
Question about the vocab.json! I noticed in the tokenization vocab. Many of the words were duplicated with the end of word string attached the second time, some words are not duplicated and either have the string or do not with no rhyme or reason. (((for anyone that doesn't know: in subword tokenization methods such as Byte Pair Encoding (BPE) used in NLP, </w> is used to indicate the end of a word when a word is split into subword units. This kind of tokenization is often used in tasks like machine translation or language modeling)) I don't know if that is an error in logging the tokens or if the tokens are actually effected by the </w>. I plan on testing the prompts both ways but I'd like to know if that was just an error in transcribing the file or if its actually in the model.
Does this model require the refiner?
You don't need to use the refiner with this model
it takes HUGE time while using controlnet, anyone have same issue?
I have only 4GB of RAM, and it takes a lot of time to generate one image
@alfaranko69 use Fooocus UI, it's very faster
@Suzanne focus is better than automatic1111 ? I use automatic1111 and generate images on all XL models takes a lot of time. I have 3060ti +16 ram +i5 11th gen
@dmytro40uah yes, you're right
I've got an RTX 3070 8Go Vram, and images that used to take more than 3 minutes on A1111 now only take 30 seconds with Fooocus.
I love it and don't need a turbo model to do that... 😊
Try putting --medvram in your command line arguments of your webui-user.bat file. I have a 1080 GTX with 8GB and my renders take less than a minute or so on XL models
well use tiled vae encode/decode and tiled ksample
@Suzanne thanks!
Are there "recommended" settings for the model? i.e. CFG scale, Sampling steps, etc. for certain types of image creation?
Here were the original recommended settings when the model was released
CFG: 5-10 ( I use 7 on default)
Sampler: DPM++ 2M Karras
Sampling steps: 25 to 35
VAE: Normal Stability VAE
Refiner: Not needed
@AI_Art_Lover Thank you so much for this. I appreciate you.
any upcoming versions in future??
probably the best model for pagan/viking photorealistic characters right now :D
Hey you can use this offline on your phone but u need 12gb ram minimum phone. U can use fp8 in comfyui I have an guide on installing ComfyUi on Termux in android
https://github.com/KintCark/COMFYUI-ANDROID-TERMUX
This was a great model, I hope you make it for Flux!!!
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.









