Realistic Universal Base Build - 1.0.0

NSFW

Realistic Universal Base Build

Or RUBB as I like to call it.

If you wish to use this model with diffusers, I also have this model over on huggingface.

Usage

512x512, 512x768, 768x512, or 1024x512/1536x512 for landscapes, but best at 512x512 for general usage
DPM++ 2M Karras
32 steps - 13-16 steps gives a reasonably converged image and is useful for looking for seeds, 16-24 steps largely converges the image and adds lots of finer detail, 24-32 steps continues to change/add finer detail. Usually by 128 steps the image stops changing completely.
Hi Res Fix R-ESRGAN 4x+, denoising 0.5, half as many steps as the main model
CFG 7.5 - 3-12 is most useful, 1-3 is interesting but lower contrast, above 12 starts to get very contrasty and starts to look pretty cooked by 15-20.

About

With many model makers mostly focusing on SDXL, or SD 3, or Flux (among other newer models), I feel that SD 1.5 still has a lot of life left in it, particularly for photorealistic output, and especially if one has lower end hardware. One issue is there seems to be a lot of in-breeding where merges of merges of merges are happening. I'd like to reduce that and come up with a solid base merge that has a lot of unique training added to it.

So, I scoured Civitai, along with hugging face and other model sources on the internet for realistic base models that had permissive licensing and had additional training layered on top, then through some very careful analysis, distillation, and merging arrived at this model. It's not perfect by any means, but I feel that it has a really nice mix of unique elements with a minimum of cross merging going on and should service as a nice starting point for either merging in specific needs or using with Controlnets, LORAs, etc.

NSFW Note

This model was not created with the intent of generating NSFW content, however, due to the nature of some of the base models mixed in, can generate pretty good NSFW content. If you don't want that, then put NSFW, Nude, etc in the negative prompt to avoid it.

Licensing Disclosure

Below is a per released version listing of models that have been mixed in that have licensing that requires creator credit be given, so those are listed below. There are many more base models included in the final output than what is listed below, however, the recipe used to get to the release was quite complex with a lot of distillation required to reduce cross merging of some of the models, so I'm not going to list all of them and how they were mixed in because I can easily fill a volume on just that, and at the end of the day, what's important is whether the final result model works as a good starting point for people, not on how it got there.

Version 1.0.0

RunDiffusion FX Photorealistic is mixed in at approximately 10% weight.

Version 2.0.0

RunDiffusion FX Photorealistic is mixed in at approximately 12% weight.
EpicRealism is mixed in at approximately 6% weight.
Juggernaut is mixed in at approximately 6% weight.
Almost Anything is mixed in at approximately 6% weight.
fusionCore Modern is mixed in at approximately 25% weight.

Description

Initial 1.0.0 Version

FAQ

Comments (11)

greenmonkey3Oct 10, 2024

CivitAI

The idea behind this model is great. I also think SD 1.5 Still has a place, at least for now. Ignoring the lower system requirements, I find SD 1.5 still the best at getting best environments.

I've had this idea rolling in my mind, but there's still so much I don't know so I'm not even sure if its worth looking into. Maybe you could help.

Im not sure of the specifics but Pony is SDXL but with some advanced training technique with a focus on characters or something like that. so advanced that I guess that the loras for this advanced training sdxl no longer worked with Regular sdxl 1.0 checkpoints. so It needed a name of its own.

You seem knowledgeable about things and stuff in areas but not in others. About that advanced SDXL training process that gave us (pony), do you know of any training options like that but for SD 1.5 or even SD 2.1. Instead of the focus being on better character understanding, Id like an SD 1.5 focus on enhanced Environment (As in a better grasp of objects combined with better compositional control (through prompting) to create environment. Here environment is like saying Image. I'm thinking of an equivalent training application to SD 1.5 the same as what pony is for SDXL. Something that still bugs me about AI Generated landscapes or any image really, is that it appears to me the model, aka SD 1.5 or SDXL etc, doesn't grasp that the landscape it generates looks like it only exists within the image....in other words, If I took a picture of a beautiful vista and we look at the photograph, you know that really the landscape goes beyond the image boarders, objects split between the picture and not picture still retain a cohesion of form, in a manner of speaking. For AI, when It comes to the edge of the image it gets quite confused where the image ends. It generates a likeness of a landscape to fit in these boarders. I dont know if The SD1.5 actually considers the words in prompts as in terms of objects in the way we understand objects or if its understanding is more like Appearances rather than objects, but for the landscape image generation to improve, one of the ways at least, the AI model cant simply attempt to recreate a likeness based on thousands of datasets that fit that prompt without it having problems with the edges. Its like taking 1000 different puzzle boxes and emptying them all out into one pile and proceeding to create a image...getting the inside image will be much easier to fit together and look cohesive, where the edges will be much harder to create that illusion the image continues on past its boarders, thats if you're even able to make the boarder look like the inside.

when the AI generates an Image what It generates should be a cropped version of what It "Sees"

I could do that manually But It becomes tricky because, for example, Every model has a limit to the size it can generate, after that things get wonky. SO If I want a landscape generated that has that "existing outside the boarders" look , I would simply generate a large image and crop it...well anyone that's used AI Image Generation knows cropping an AI image to get a better composition Just doesn't happen. The Latent image size is not simply a canvas, Its a container, a closed system. Like Water in a glass jar clouded by dirt particulates, eventually the dirt settles to the bottom and the water is clear. How the dirt settles is dependent on the container and what the sediment is composed of.The edges or center of a cylindrical jar will have more sediment in the center and around the circumference. I'm losing the analogy here but hopefully this next part makes sense, say that sediment on the bottom was important for some reason but you need the sediment to settle evenly and not collect in the center or the edges, collecting and using only the flat parts while possible (in this hypothetical analogy) its still unusable because the high spots negate the low-spots and removing the high spots is effectively just as unwanted.

That was my attempt at drawing simulates to show why for AI generated landscapes to be better the composition needs to improve which is already known by most people but to get the composition right (when drawing) You have to envision the image existing outside of its boarders.

I'm not going to go into why its important for the landscape to exist beyond the boarders in order to make the composition to work. What Im trying to convey has to do with the way I would like to see SD 1.5 enhanced training focused on better landscapes. SD 1.5 Already has a head start on good landscapes. As far as advanced training options go I have no clue as to what the options are so I'm not even sure if my line of thinking would be possible to train into an AI model....

You may understand what I'm getting at and still think its nonsense because its not possible... ok if so thats ok. But Id still like to explore those "options" in advanced training. Perhaps there are other approaches available that I'm not aware of.....in terms of training a better model....I'm already familiar with the work-around to improve landscapes on existing models so there's no need to suggest them.

If you dont know maybe you could refer me to someone, but dont feel the need to look into it.

I did take a good chuck of time to type this out but that's an indication of me trying to talk about something I just don't have the words for, Not an indication of me being over-exuberant or something like that. I randomly found your model here and randomly had the thought that this person (you) may have some insight and might be nice enough to help. My contacting you is spontaneous. A shot in the dark. I'm unsure of where to look so....

adrianbacon

Author

Oct 10, 2024· 1 reaction

Uh... model training is model training. There's no "advanced training". How good a model is at generating something is completely dependent on the quality of the sample images that go into training it. As far as Pony goes, well, it's simply had enough additional training done with it that it's diverged enough from the base model it started on that it can now basically be its own model.

The only differences between SD 1.5 and XL is that they changed how the underlying engine worked then made a new model using that with different training images. They could have easily changed how the underlying model worked then trained a new model with the new inner workings with exactly the same training data as what they used to get to 1.5, then do additional training on top of that to get to what they now call SDXL, but... they didn't. Every time they seem to start anew with all new data.

BTW, there's nothing stopping you besides money, compute, and actual sample images from taking the 1.5 base model and training a bunch of landscape images on top of it. If you look at how they got to 1.5, it still has all the smaller 256x256 training data in it from the earlier models, they just added a bunch of 512x512 stuff to get to 1.5. If you wanted to make a "1.6" with more resolution you can take the 1.5 checkpoint and do the same thing they did to get to 1.5 but with bigger and landscape focused training images. Anybody that wants to can do so.

contrarianOct 11, 2024

I'll just insert some of my insights here, don't know if it helps.

If you train a neural network on ONE image, it will simply memorize that image and reproduce it for you. It will learn NOTHING. A hundred images, same thing. A thousand images? Still not enough. But SD 1.5 was trained on BILLIONS of images harvested from the internet without quality check or moderation. 99 % of it was utter garbage too. The "brain" of SD 1.5 only has so many "neurons", so there was not a chance in hell it could memorize all that in perfect detail. So in order to satisfy the reward function it was forced to actually learn to understand the images. It had to find patterns and associations and generalizations. It had to learn that this was this and this was that. And it had to build up a virtual 3D-representation in its internal "mental model" of the world. It was in short forced to develop VISUAL INTELLIGENCE to be able to score well on the reward function.

In other words, the only way a neural network develops intelligence, is if you completely swamp it with more raw data than its poor neurons can memorize. And that is a fundamental truth that also applies to humans. The reason most people have poor intelligence, is because our schools give the kids a tiny amount of data and then test them on whether they can reproduce it or not. The "reward function" of doing well at school is therefore completely at odds with how intelligence is developed.

When I was a kid I must have instinctively felt that this approach is wrong, because I never did what the teachers told me. Instead I read books. Lots and lots of books. Advanced books, well beyond my level of understanding. And as a result I became highly intelligent, scoring 160 IQ (on professional tests that take days to do, not some online BS test!).

Now, there aren't any more neurons in my brain than in anyone elses. My intellectual potential at birth was the same as that of any average person. I didn't use any advanced studying techniques either, the ONLY difference is that I overloaded my brain's neural network with more raw data than it could possibly memorize, so it had to develop understanding to makes some sense of it. My brain was too small, so it had to compress the data to be able to handle it.

Intelligence is fundamentally a lossy data compression technique.

That insight has now been proven. We have equations for how intelligence scales with the amount of raw data you train on. Schools no longer have any excuse for ruining kids' brains.

But that's besides the point here! The point is that SD 1.5 is in many ways more intelligent than the supposedly more advanced tech bases that came after, simply because it was trained on so much random garbage. All later tech bases have bigger brains that were trained on less data, with more purposefully curated data sets, both for quality and censorship reasons. They perform better on technical tasks, "apply X to Y and put it on top of Z". But SD 1.5 has a chaotic intelligence that excels at being creative and artistic (and more than a little sexy!).

It doesn't just know how to make sexy girls, it knows how to make sexy landscapes as well. The unfiltered training gave it a subtle understanding of what human beings find attractive. It had to understand human psychology to satisfy its reward function.

One thing I think very few people here understand, is that a finetuned SD 1.5 model is still to 99 % the original model, all the finetuning we have done on top of it just nudges it a bit in some direction and introduces some desired biases. For instance, the original SD 1.5 saw a billion ads, so it figured that humans really like to see random letters scrawled on top of images! And so it faithfully does that, to make us happy!

But we actually hate ads, so our finetuning has introduced a bias against this behavior so that it rarely happens anymore. Our finetuning has nudged the model behavior closer to what we actually want to see. That's the only reason modern models make better pictures that are more aligned with our actual desires. The "bad behavior" is still in there, we haven't gotten rid of it, and some prompt and seed combinations will make it reappear.

When it comes to training SD 1.5 to make better landscapes, you need to do a finetune, using at least a thousand good images. Ten thousand is better. This is as I explained still not enough to teach a neural network from scratch, but it is enough to finetune an existing base model and bias it towards making better landscapes. Part of it is that you're introducing some more visual data on the topic, but the biggest part is actually that you are unlocking what is already in there, making it easier to "pull it out".

My own model Contra Maiden was biased towards making sexy girls this way. It can do anything really, depending on how you prompt it, but when in doubt it tends to "play it safe" and make a high quality photorealistic sexy girl as the default choice, which makes it super easy to prompt for those! You can make a specialized Landscape model the same way. I hope you do, I'd like to see one!

It's just a ton of work... God help us all.

adrianbacon

Author

Oct 11, 2024

@contrarian Oh hey! I'm actually looking at possibly including Contra Base in a future revision of RUBB. Nice to see that there doesn't appear to be any license restrictions. I'm currently running a stack of my test prompts against it to see what it makes relative to the same prompts and seeds of other base models. From there I'll figure a strength to mix it in at. I do have a little bit of the model you based Contra Base on in my model, so I'll need to deal with that, but otherwise, pending any major gotchas, it'll hopefully be a nice addition of additional training data to include.

contrarianOct 11, 2024

@adrianbacon Cool! My opinion on "licensing" is that data wants to be free. Everything out there contains something "stolen" anyway, so it's pretty ridiculous to claim ownership and attempt to control what people do with it. In fact, the only reason models are as good as they are, is because of all this "theft". So if someone finds a use for my model I'm just happy! Feel free to merge it if it helps whatever you're trying to achieve!

However, I'm not so sure "more stuff" is what your model needs at this point. If you read the article I posted you'll see that I think it already is well balanced and has all it needs in it, it's just not fine tuned enough for image quality. Whatever method you have for "finding the right balance" for your merges seems to be working great for finding a great approximate answer, but you may need to change approach to reach that last "sweet spot", where fine detail suddenly snaps into focus.

Since I've been struggling with this problem for literally months of full time work (!), I think I have a few insights to offer that may be helpful. I'll write up another article on how to fine tune a model.

contrarianOct 11, 2024

Done!

https://civitai.com/articles/8007

adrianbacon

Author

Oct 11, 2024

@contrarian my goal is to have a nice general purpose model that has a good mix of unique additional training data in it. To be clear, it is a merge of other base models so will inherit a fair amount of those characteristics. As I find other models that have unique training data in them I evaluate them for inclusion based on what they output with a given set of prompts. Some models just don't make the cut because they're too similar to other models I already include in the merge, or I'll subtract the model they're based on from them in an attempt to extract just the unique stuff about them and then dial in some percentage of that into the final merged model. I'd love to tone down the bias towards girls, but in all honesty, there's not a lot of base models that don't include a heavy mix of that.

contrarianOct 11, 2024

@adrianbacon True. One of the reasons I made Contra Base is because I felt a less girl-focused model was needed. I wanted a model that can do both young and old people of both genders, and of any ethnicity, while defaulting towards Caucasians more than Asians, (with Africans lurking right under the surface). It was also trained on lots of ugly people with very unique appearances, in an attempt to break the "sameface" curse. The average of a thousand ugly people is still beautiful, so the model still makes good-looking folks, just with a bit more variety than what we're used to seeing in most models.

Mixing in a bit of Contra Base might help your model some in this regard, but I honestly don't know any other model that prioritized this to recommend. Most share the same biases, and make very similar faces.

But of course, once I made Contra Base I wanted it to be more girl focused anyway, so I made Contra Maiden, which lost a lot of the unique character in favor of being sexier, and that really only stands out from the crowd for its smaller default breast size... I'm just as guilty as the next guy when it comes to just wanting to make girls!

adrianbacon

Author

Oct 11, 2024

@contrarian Just out of curiosity, how many images, steps and epochs did you use for fine training?

contrarianOct 11, 2024

@adrianbacon I'm a disorganized dude with poor memory who don't keep track of what I'm doing, so I sadly don't have any solid numbers to give you. I can't even tell you what I ate yesterday!

But I can tell you Contra Base was made from two data sets, both with approximately 10k images. One was a very diverse assortment of people of all colors and sizes, almost all fully dressed, and one was a total hodgepodge of absolutely everything. There was also a third big data set containing nudity and porn, but it was mixed in at a subliminal level so it wouldn't be noticed, it was just meant to improve the model's understanding of anatomy and poses.

I hoped to achieve a bias-free and flexible model this way, and it was at least a partial success. I would say the model is biased towards human faces though. It almost always nails those, while all other things are of a slightly lower overall quality level. This isn't entirely a bad thing, since humans care a lot more about faces, but it's still a shortcoming worth mentioning.

adrianbacon

Author

Oct 11, 2024· 1 reaction

@contrarian No worries, it's all useful

Checkpoint

SD 1.5

by adrianbacon

Download (Beta) View on CivitAI