Trained from SDXL 1.0 Base using both photographic images and 3d rendered images.
The method I used is:
All 3d rendered images were given the tag “3d render”, they mostly feature my own despicable creations which I have shown on my DeviantART page https://www.deviantart.com/squealnsquirm
and the trick to getting them to show photographically is to simply not use the 3d render tag. This actually does (mostly) work, not sure if the method is well known, so wanted to share it.
Additionally, you need to use a variety of different concepts in the 3d render set or SDXL will too strongly associate that concept with the style. The 3d rendered images were produced using Daz Studio and character models purchased on their site as well as other assets.
Photographic images were (mostly) given the tag “photo”, I discovered too late that you need to have the tag on every instance or the network will mostly ignore it. Do not use “3d render” in the negative prompt, produces shockingly bad results for some reason.
Unfortunately, the results are not as good as I had hoped, hands especially look terrible and I have no idea how to fix them. I kept trying (and failing) and eventually gave up, hence why I'm only publishing this now when the best result (or least terrible) was finished back in February.
Hoping SD3 is easier to train and I can finally get this done to my satisfaction.
Have fun playing around with this, maybe merge with other checkpoints, but consider it to just be a preview.
Try the following:
* start with a material/color, e.g. “shiny metal”, “transparent”, “red” etc
{material} armbinders, right hand over left, hands curled up into a fist (works best with everything written, 3d rendered images flipped during training “right hand over left” -> “left hand over right” and alternative hand poses “hands open fingers curled”, “hands open fingers straight”)
{material} reverse prayer armbinders (mostly good results)
{material} mobile pillory (mostly good results)
{material} upright pillory (mostly good results)
{material} bent over pillory, {material} head board (terrible results)
{material} bodybinder (mostly good results)
{material} ballgag (mostly good results, but has tendency to ballgag everyone in a scene)
{material} ponygirl belt with {material} (tail:2) (“tail” needs emphasis to work)
“ponygirl” is unrelated to Pony Diffusion
example:
“naked girl in metal armbinders standing at a bus stop”
12 direction statements used in training (I render 12 images using a rotating camera rig), the second direction statement is less significant then the first, figured it might be easier for the network to understand.
from the front
from the front and from the left
from the left and from the front
from the left
from the left and from behind
from behind and from the left
from behind
from behind and from the right
from the right and from behind
from the right
from the right and from the front
from the front and from the right
you can also try “from above”, “from below” and “from a distance”
miscellaneous terms (used with photographic images as well as 3d renders)
looking at viewer
looking away
cute face (any girl I thought had a cute face)
sexy body (big boobs, curvy butt)
slender body
large breasts/boobs
small breasts/boobs and/or flat chest
oily skin (results in extreme oiliness, like just covered in oil)
wet skin
wet hair
attempt was made to teach the network about male anatomy, but was a failure, you can try “erect penis” if you enjoy screaming in terror.
Very open to suggestions on how to achieve a better result or stuff people might want in a future version
Description
FAQ
Comments (13)
If you want your end result to be more photorealistic, try also merging some photorealism checkpoints in, like Juggernaut or RealVis or something. There are a lot out there, but a few like Juggernaut are actual fine-tunes like yours, rather than just merges. Since your checkpoint is heavily themed, I think you can get away with adding a few merges to enhance the photorealism aspect. Might have to experiment a bit.
I'm on an 8GB GPU, and I'm a bit nervous about how SD3 will perform on my card. I'm hoping that StableAI hasn't completely left the average consumer user out to dry.
Or train a lora instead so people can use it with whatever checkpoint they choose.
@MysticDaedra
I'm pretty sure you will still be able to use SD3 with only 8GB of VRAM, supposedly the resource requirements will actually be lower then SDXL; the subsection of the pipeline that performs the generative function is 2.6B params for SDXL and only 2B params for SD3.
Supposedly it is possible to train a LORA for SDXL with only 8GB of VRAM, so I'd expect it should be fine for SD3 as well.
As for merging, frankly, I just wanted to learn how to train a full network myself, there is hardly any good information online explaining how best to do it, so it seemed like an interesting challenge.
the model can do way more then just the fetish concepts depicted. It was trained on 12k high quality pornographic images and 1.5k of my own 3d rendered images and I also had a small set of 500 of my favourite images with hand written prompts which was every bit as maddeningly tedious as it sounds.
@lokithorodin
I trained it using diffusers and my own code, I experimented by only having certain tensor groups require_grad, the results were super random, (sometimes good, sometimes terrible with no discernable pattern to learn from); I did consider doing it as a LORA, but really wanted to do a full network training. It would been a smarter way of doing things though.
@SquealnSquirm might want to add some non-fetish concept examples than, which you think this model does well.
I tried it one some prompt, many poses work quite well but I would agree that the last bit of realism would be nice.
Anyway thanks for your effort, hope you keep going with Lora or V2 checkpoint!
WOW, this works exceptional well! Your approach with 3d Renderings added to the mix really worked out when prompting for photo. Also, often the concepts get a bit wonky on other Loras, but training it all in one Chechpoints really gives consistent good results. I hope you plan to add new concepts in the future! I mean, you add your own 3D renders, you can make all sorts on hot stuff ;) So in the end SD3 was good for something. Found you on the SD3 ban thread and clicked your profile there. Would have totally missed this gem otherwise
I seriously tried so hard to train SD3, I don't think it's unbreakable, I managed to get it to occasionally draw nudes that aren't absolutely horrifying, but it required 448,000 epochs and it's utter garbage compared to what I could do with SDXL in 1/10 the time. Maybe someone will figure out how to break it, but I'm more excited about CivitAI (and others) potentially making an open community model. Maybe 3 months from now we'll have the model we deserve without the ridiculous license nonsense and SAI can quietly cease to exist along with all the silly censored and closed source clowns out there.
For now, I'm preparing a larger dataset and expanding the concepts involved, might make another attempt at training SDXL or just retraining what I got. I'm also producing a dataset from cherry picked 'good results' to see if that helps with training.
@SquealnSquirm did you train SDXL base or a finetuned model? Maybe Aura Flow will be the better choice in the future! The v0.1 is already very promising =) People say it's mey be censored, but I think it might be just undertrained at this stage
@WhatTheGuy I trained from the SDXL base.
I just took a look at Aura Flow https://huggingface.co/fal/AuraFlow; it's absolutely frekken enormous, 14GB for just the transformer (in float16) makes it a 7B parameter model (excluding the vae and text encoders). Unfortunately, even just fine tuning it on consumer hardware would be virtually impossible,
It's also using a variant of the T5 text encoder (exclusively) and the T5 TE (from Google) have not been trained to understand any NSFW terms, the tokenizer doesn't even have embeddings for NSFW terms, just embeds with the letters in the word.
it's a shame, but it's pretty much guaranteed to be censored and completely untrainable.
@SquealnSquirm Sounds bad... but also sounds like the exact opposite what they wanted to be... hmm ... let's wait until there is a v1.0 and what changes until then
Have you tried going through and I2I your creations to use as training data? Could even use the tag 'ai gen' and negate it from prompts similar to '3d model'.
actually preparing a dataset like that now to see how it goes
Why is this a checkpoint and not a LORA?
Your idea here is just off, what you want to do is take each idividual pose and generate around 100 images. Then train a LORA for each of them.
Your model falls apart almost instantly as the prompt grows. It's a cool concept but execution needs work.
Such a pity that you made it as a checkpoint and not a lora. You also transferred some of the render quality into your mode, which is good for the devices but not so good for the people. Should have excluded the people by tagging them accordingly.


















