Ultra Maiden is my final SD model. The SD architecture is becoming hopelessly obsolete, so I quit working on this architecture after posting my last model.
But then I unquit and decided to do one last thing - merge all my previous models into one, using advanced merge methods to distill all the good stuff from all of them, and filter out the bad. This turned out to be a more substantial undertaking than I imagined, but after about 6,000 merge steps and a quarter million generated test images I finally called it done. Ultra Maiden 1.0 was born!
Is it perfect? No. But it is the best SD model I ever created nonetheless. A worthy final tribute to Stable Diffusion!
So What Is Ultra Maiden All about?
This model is a bit of an odd duck. It can make very high quality photorealistic images (for SD at least!), it is especially good at sexy maidens (with or without clothes), and it is able to do a little fantasy and scifi and other cool stuff, if you nudge it the right way. It's also great at environments, especially epic nature scenes.
But the odd thing about it, is that it doesn't respond like other SD models. Partly because it's its own thing, and wasn't incestuously merged with everything else out there, as other models typically have been. And partly because I have taken a very different approach with tuning. The model has not been tuned for good prompt adherence as a priority (which SD sucks at anyway!), instead it has been tuned for high quality results from very short prompts, and maximum creativity (different seeds giving very different results). It can get confused by long prompts and will not always obey detailed instructions well. I think a lot of other models are stronger in this regard. But in return it can create good images from just a couple of words, which many other models don't.
In other words, you can get good technical quality and striking compositions with very little effort, but your ability to exactly control the output is more limited. That's the tradeoff.
For this reason I recommend that you don't do your usual SD prompting thing (especially not throwing in your old standard negative prompt!), and instead try the following approach, to get the most out of this quite unique model.
How To Use It, Method 1
The example images were made with a very simple prompting strategy that is the most bang for the buck approach I have found with this model: You simply write two tokens in the positive and negative prompts, with the first token being the same for both prompts.
This approach may sound weird, but the first word that is is canceled out by being in both prompts sets up a subtle context, and then the other two words set up a polarity from this context. It works very well.
Try different words until you find a combo that tends to produce interesting results, then generate a bunch of seeds and wait for a really good crit. Then you lock the seed and generate again with a bunch of variation seeds (strength 0.01 to 0.05 is usually good.) When you crit again you lock the variation seed too, and do a final polish of minor details by fine tuning the variation seed strength. And you're done. You can get all sorts of images this way, you get high technical quality, and this prompting method is excellent at pushing the model out in the rarely explored dark corners of its capability, avoiding generic and samey results.
How To Use It, Method 2
You can of course also use more complex prompting to specify what you want in more detail. This model isn't entirely stupid, it can just be a little more unwilling to comply than other models, that were tuned with prompt adherence as a higher priority.
An excellent way to write more detailed prompts is to use the following prompt format:
First you describe what you want, as succinctly as possible. Then you follow with a bunch of words that describe mood and style, but that preferably don't imply anything about the actual concrete image content, especially not conflicting with what you just wrote in the first part. And in the negative you put the exact same thing, except flipping the order, with the fluff words first, and the content specification after.
I posted an extra example image below to show that this method can allow you to give the model pretty specific instructions. Let's go through the prompt step by step to explain exactly why it works, this will teach you some fundamental principles that must be kept in mind.
The positive prompt is:
cat-photo print t-shirt aesthetic authentic interesting
And the negative is:
aesthetic authentic interesting cat-photo print t-shirt
This means the concrete specification part is:
cat-photo print t-shirt
This is a well structured prompt because the model understands what a "t-shirt" is, and it also understands what a "print t-shirt" is, as well as what a "photo print" is. All these things are word connections the model recognizes from training.
However, the connection from "cat" to "photo" is too weak, which means you might get cat ears instead, or just a broken image, signalling that the model didn't understand what was asked of it. But the easy fix is to tie these words closer together with a hyphen ("cat-photo"). During training the model has seen plenty of examples of words being tied closer together by hyphens, and it has generalized this understanding, so you can use it anywhere it helps connecting two words. Just don't use it when the model already understands the connection, then it does more harm than good!
The takeaway is that the model must understand all connections from word to word. That's what makes a well-structured prompt that produces good and (mostly) flawless pictures. You don't need to build grammatically correct sentences, as long as every word connects to the next in a readable way you're golden.
You can of course also make several comma separated sections as usual, but it's actually even better if you can replace the commas with a logically connecting word that adds more meaning. Commas are just the easy way out when all else fails!
EDIT: I just realized I never explained why I put the same stuff in the positive and negative, just in different order. This is generally useful knowledge, so I'll write up a more thorough article about it instead of dumping it here.
Recommended Settings
To get the best results you also need to use correct settings. Perhaps a bit unintuitively, you get more boring images the more steps you use! This is because more steps means the sampler takes better care at following the prompt exactly, which doesn't work in your favor if you use extremely simplistic prompts as instructed in the first prompting method above. So you need to use enough steps for the sampler to finish the image, but no more! These are my standard settings that were used for all example images:
Sampler: DPM++ 2M
Scheduler: Align Your Steps (very important to use this one for low step counts!)
Steps: 12
CFG scale: 8 (a slightly high scale gives better results when both prompts have similar things in them)
Size: 832 x 576 px (the model is actually happiest at 768 x 512, but that is just too little IMO)
Final Words
Have fun! This is just a cool toy after all. Newer tech is so much more advanced and capable it's not even funny, but good ole SD has a lot of charm. It's dumb as a rock and doesn't understand your prompts very well, but it has a vague intuitive understanding of all kinds of things that newer models don't have. You can give it names of places, and it has a good feel for them. You can give it the name of a girl, and it has a feel for how she might look too. You can use any unusual word, and SD has a feel for it. SD is all about the feel. Use this to your advantage, and you can get some interesting results!
PS
There are more example images in the article linked below, which explains another unique and very powerful prompting method that can be highly effective for all models, not just this one. I suggest reading it! ;-)
Description
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.



















