Intro
I made it just for fun and as experiment to build a model good for augmenting professional photographs. I am using Nikon camera with bunch of vintage lens. I expect to build an SD model which is able to produce moody, cinematic pictures with nice smooth bokeh and "analog style". Please note, that I don't plan to train this model on any hardcore nsfw. Don't expect / request it from "cinero" models ;) My preference is art, beauty and emotions.
Some tips on Prompting
Few examples:
"[grayscale : [dimmed colors : vibrant color splashes : 16] : 8]" - I call it "temporal trick". What it does is just make your prompt depending on current step. With this prompts SD will use "grayscale" on steps 1..7. SD will use "dimmed colors" on steps 8..15. On further steps SD will use "vibrant color splashes". I believe that there is no strict limits on nesting level. What you can do with it? You can effectively reduce the number of tokens SD process on each step (reduce the length of active prompt). On the first steps there is no sens to specify fine details. You only need to specify the scenery roughly. On the later steps there is no sense to spend tokens on describing the composition and lighting (I suspect). So, in theory, with this trick and big number of steps you can keep your prompt short and have build very rich prompt at the same time.
PS: this prompt above force the SD to draw the scene with very little colors and super vibrant segments (I showed grayscale images where a subject has few vibrant hair curls or clothes parts). Probably, you can reverse this effect by making whole picture colored with some parts made grayscale.[Audrey Hepburn : Milla Jovovich : 16] - you can have fun with smooth transition from one face to another with XYZ plot script in Automatic1111. Also, this particular temporal trick with face / body helps my model to render most realistic and correct anatomy. I suspect you can also implement dynamic LoRa weighting with this trick. If LoRa don't have a trigger word you can just put the LoRa token like [ <lora: ...:0.42> : <lora: ...:0.99> : 16] or you can use multiple levels of nested "trigger words" from different loras.
"shot on %Brand Name% %Lens Mark Name% vintage lens" - if you find the vintage lens names which SD have in its memory, then you have a chance to improve an "analog style" of your picture. I used to use "Carl Zeiss Sonar", "Nokton", "Helios 44-2", but cannot confirm that each particular lens model gives unique effect. If you have you own list of confirmed lens models, then please consider to share it with community in comments to this model [%PICTURE OF LEELOO saying HELP%]
In near future I plan to build a training dataset with many images shot on beautiful vintage lenses to bring old-school photography soul into this model. I will use some unique trigger word for that or will use a "vintage lens" (not sure yet).
use "perfect anatomy", "anatomically correct body", "anatomically correct hand", "perfect hands", "anatomically correct fingers", "perfect limbs anatomy" and similar anatomical phrases to increase the chance to get correct anatomy.
Us words smooth bokeh, swirly bokeh, depth of field, smooth background to increase the separation of main subject and scenery.
Use "turbulent fog", "mist" and "haze" with "mystical lighting" to get nice atmospheric picture with super noticeable depth of scene. Also use "early morning" and "blue hour" phrases if you want to get cold morning vibes.
Use "scary face expression", "surprised expression", "inviting expression", "lustful face", etc to increase the chance to get noticeable emotions on face and visible "body language". It works, but not yet very noticeable.
Priorities of this model
Cinematic photo-realistic pictures of female character (sfw, softcore nsfw)
Natural body, skin texture, [to be improved] environment (dirt, dust, stuff on floor, retro furniture and devices)
Realistic optical / photo effects (smooth swirly bokeh, analog film grain, aberrations [in progress]) of vintage lenses (Carl Zeiss Sonar, Jupiter 37a, Helios 44-2)
[To be improved] Urbex, abandoned, decaying interiors, depressive vibes, dimmed colors, fog, mist, vapor
How it was created
It is based on few merges of Analog Madness, URPM, Cyber Realistic, epiCRealism, ICBINP, Cine Diffusion with coefficients in 0.18..0.35.
It was trained with two datasets of carefully selected art photos with similar features (cinematic mood, atmospheric, charming anatomy, soft core / ero, retro interiors, morning outdoors, etc.). Total number of images in datasets: 600-700.
Trained as LoRa with 20 steps per image using Kohya_SS then merged with coefficient ~0.3 into Merge of mentioned Checkpoints. Better to use with my LoRa with the same name to amplify the effect.
Further improvements
By priority:
[done] Fix / Improve hand and fingers generation
[in progress] Improve gloom, bokeh, chromatic aberrations, spherical aberrations, light leaks and old analog film features
Fix / Improve feet and toes generation
[in progress] Add more urbex, abandoned, vandalized interiors and lost / forgotten outdoor scenery (suggest me good datasets pls ;)
Fine tuning / improvements of eyes and anatomy
Feedback appreciated...
Description
CinEro v1.2 FP32 RC1
Trained w/ dataset containing 131 training images (three different pro model photosets with studio light and sharp focus) and ~4800 random woman photos (sfw / softcore nsfw).
Dataset priority: anatomical correctness of limbs.
Epochs / steps: 15 / ~50800
This version:
gives lower chance to get deformed / mutated / fused / redundant limbs (see positive / negative prompts in my images)
gives better chance to get correct hand / fingers (much better with my prompts; still not ideal)
gives lower cinematic / artistic effect (because of low percentage of art photos in training data)
relatively good guidance and prompt follow (see sampler setting and CFG in my samples)
gives more contrast and darker images with higher CFG (side-effect like with EpicNoiseDiffusion)
gives lower chance to get artifacts like over-exposed (white) small splashes on contrast edges
PS: I will postpone uploading of FP16 2GB version. It showed me lower performance. It is ready bu first test showed me that it don't follow prompts with the same precision of FP32 and gives more boring scenery (I am not satisfied with FP16 performance). Let me know in comments if you OK with lower quality of FP16 please.
FAQ
Comments (11)
I have the same sentiment, I'm tired of those asian anime 2.5d models, all the same, as a photographer I would to refine this model of yours....
the only one that has that "something" for me, to create a"fork" but I'm only good at photography,I still use an old Pentax.
Where can I find a well done tutorial on how to refine a model like you do? Any info in rhe right direction from anyone will be greatly appreciated, mind you that I'm a totally noob at this ¯\_(ツ)_/¯
1. I am not a big fan of "vintage" camera body (I prefer modern mirrorless cams with better performance), but my daughter likes old Canon 7D. I am a BIG FAN OF VINTAGE LENSES (Carl Zeiss and other German lenses, Schneider Super Cinelux, some USSR lenses). I love manual focus more than AF.
2. As for training I didn't find any good tutorial yet. I googled a lot of guides with
"Kohya SS training guide"
and used different videos and posts to aggregate useful knowledge.
Try to go the same way. Dreamboth extension inside Automatic1111 don't work for me (eats too much VRAM and crashes).
Kohya SS takes less memory with its own embedded Dreambooth (don't know why).
If you have NVidia GPU with 12 GBs of VRAM or more I can give you the following recommendations:
1. For SD1.5 full dreambooth training you can set Max Resolution at least to 1024,1024 (or less if you wish faster training; ~4800 images dataset will take about 360 hr of training on RTX 3060 12GB).
2. You can use any aspect ratios of images in dataset and any sizes, but try to minimize the amount of different aspects. Ideally use two aspects: portrait height / width = 1.25 (for example) and landscape width / height = 1.25. This will minimize the count of img buckets and significantly reduce VRAM consumption.
3. Use less training images, caption it with Kohya's automatic WD14 captioning (see Utils tab) and then carefully edit each caption by removing incorrect tags and adding your own better descriptions of the picture. Main rule here: what you see first on the picture - goes closer to the start of text.
4. You can generate regularization images with dreambooth inside Automatic1111 (didn't find same feature in Kohya), but these images will be mutated, fused, ugly. I tried to generate reg imgs with SDXL, but dreambooth crashes with SDXL before any img generated. So, I used a recommendation to get random real life images as a regularization. Also I captioned these images with Kohya automatic captioning (manual editing of ~4800 images would be a hell), but I feel that Kohya don't use a captions of Reg Images.
5. For NVidia of 3xxx and 4xxx series of GPU you can use Mixed precision: BF16 and Adafactor optimizer. It will consume ~11,5 Gb of VRAM.
Good luck, BRO!
@homoludens thank you very much!!! being an old photographer I have many beautiful Pentax lenses like SMC Pentax 77mm f/1.8 Limited, to me Pentax made some legendary lenses, but this doesn't stop ne to appreciate modern DSLR camera!Wow, your daughter is a cool girl and a fine connoisseur, means that you did something the right way.(◠‿◕) I will copy your answer because is the best guide I could find until now! I guess you have a rtx 3060 12gb, is the exact graphic card I have to experiment... then who knows? things are moving really fast!
Thanks again for your help and may the demons of ugliness and boring photos lose their path on their way to your home!!!
@ulicoconut483 Vintage lens has a soul. Modern lenses give too perfect picture (no old-school flaws in bokeh). As for the camera, for most ease conditions they don't make any difference on picture style (IMHO). In low light modern cams are better.
Yes I use RTX 3060 12GB (relatively cheap). Best choice will be RTX 3090 24GB - good for all.
Best thing you can do - manually select your own photographs, caption them with love and train your own style. If something unexpected gets broken, use Super Merger extension for Automatic1111 to fix precisely what you don't like with "MBW" checkbox checked and tuned sliders for INxx and OUTxx blocks.
https://huggingface.co/WarriorMama777/OrangeMixs/discussions/66
Top sliders are related to small features like film grain, point size details. Middle sliders are for fingers, limbs and micro anatomy. Bottom lines are for composition, big area colors and probably for lighting.
Left position of the slider - keep Model A. Right position - move toward Model B.
If these recommendations will work for you as well, you will do me a good favor if compile these shards of my mind into good tutorial and put somewhere a link to this model. I would like to get a feedback from community (whether I go in the right direction or not).
@homoludens is Canon 7D 'old' ?!
Will you release the lora you mention in the description?
Yes.. I only need to make sure I have stabilized model with features me and community like.
Then I need to select the base model to calculate a difference with.
What you think? Should I use vanilla SD1.5 as a base or take some realistic model instead?
You're talking about extracting the Lora, right? I'd try with whatever model you used as the base for training, that way you're only extracting whatever you put in. That's where I'd start at least.
@kaali111 for me it will be easier to get the Lora as a difference like "Cinero" minus "Some Vanilla ckpt".
Instead of training Lora from scratch.
I don't plan to train this model on any hardcore nsfw. Don't expect / request it from "cinero" models ;) My preference is art, beauty and emotions.
Much respect! I'm not against hardcore models but there are SO many of them, this is a refreshing difference. It is always great to have options!
Improve hardcore is boring now. Any body can take my model and mix in "a bit of melody into hardcore drum'n'bass".
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.



















