So....I wanted to try something here as a bit of an experiment.
I grabbed a fairly low quality bunch of images of a poser character (man, if there's something SD will kill off soon enough, it's that). I then did a LoRA using two different versions.
Auto version was me taking the pics, putting them into BIRME for 768x768, having BLIP caption them, and being done.
Manual version was same as above, but I actually manually edited the captions to add a little more intelligence to them. Cuz BLIP is pretty stupid a lot of the time.
Interestingly enough, the Realistic Vision models seem to get you closest to the original poser type aesthetic. Which isn't awesome, cuz Poser was never all that good at making good looking pictures.
So the question is...which one is better?
Description
This is the brainless CLIP version. What got into the captions is what CLIP put there on its own (apart from a prepend of thiefezri)













