This model is an Unstable Diffusion community project and wouldn't have been possible to train without their generous help and resources. A heartfelt thank you to them and everyone who's contributed their data, time, and experience!
You can prompt unlimited generations with this model on mage.space
Please check the 'About this version' tab to the right for recommended positive and negative prompts and some training details.
Some advice ↓
This was trained at 768x768 with aspect ratio bucketing, so resolutions below 768^2 will likely give you bad results that I can't support with this model. Try 768x768, 768x960, and 960x768 for your standard resolutions.
If you're having trouble keeping the face in shot try mentioning their gaze ('looking at viewer' 'looking afar' 'looking to the side' etc.), facial features like their eyes, hair style, or expression, or putting 'head out of frame' in the negative prompt.
This model won't do well with most slang terms as they likely weren't in the tagging. And be careful recycling old prompts. Explore the effect of each tag or at least get rid of weighting and consider starting with a sample image to get a feel for it.
There were no watermarks in the data set, so 'watermark' 'artists name' etc., are meaningless. You don't need to beg it for 'high detail' 'photorealism' '8k' 'uhd' etc. It'll do that right out of the box. Save yourself some token space and prompt for what you want to see.
If you use this model as part of a mix or host it on a generation service, please mention and link back to this page (especially if you're making money off of it)
Description
The UNet and CLIP in v1.1 were detected as broken in the Toolkit extension. This version doesn't have that error. Every sample image has been recreated. There's very little difference if any but this is the cleaner model. Sorry about that.
Recommended (please note, these have changed since version 1.0)
Positive: masterpiece realistic, best high quality
Negative: (worst simple background, jpeg artifacts, bad anatomy, anime, digital illustration, 3d rendering, text, overexposure:1.1)
~8k images were captioned using a combination of styles. The core images used what I call 'consolidated booru tagging' i.e. 'blonde hair, very short hair, undercut' becomes 'very short blonde undercut hair' or some variation of that. Another portion were manually captioned with sentences followed by tags for missing details which the sentence could not capture, they were meant to be tight but descriptive. The final portion used auto booru tagging with a rather large exclusion list. And finally, an intrepid member of the internet passed on hand-tagged images of dongs which were included. A deep thank you for that.
It was trained for ~50 epochs using the EveryDream 2 fine-tuner, a base resolution of 768 (so lower resolutions aren't recommended when prompting) at clip skip 1. Majority of settings were left at default.