CuteCaption | UltraInsta Z-Image Turbo - The Ultimate Character Creation Solution

This LoRA is the full weights of a 5k step high noising > balanced noising strategy, that is meant to be block-balanced, and then combined with a balanced noising > high noising for a merge. I'm pre-releasing it while the final versions of the resulting merges are being created and close to being finished.
*Please note, this LoRA does perform well and creates very beautiful shots, but my plan initially was to fine-tune the blocks and release a more finely tuned version, but since this process has wound up taking over a month, I've decided to release the full weights for the 5k-highnoise>balanced version - we are training at Rank 200, FP32 and a slow learning rate. 

*This LoRA was trained before the Stochastic-rounding issues with AdamW8bit were fully understood -- for the sake of testing, we will train this LoRA with ProdigyPlus for the next versions, but it's possible that training such a large dataset on AdamW8bit resulted in minor issues with anatomy under certain conditions

 [training with different noising emphasizes different body and face types in my testing, and so by training 2 LoRAs of the same dataset with separate strategies, I can block-inject 4-8 blocks from the balanced strategy into the high noising strategy, to get the best of both worlds (body+face), but the testing has taken a considerable amount of time and I want to put out something that the community can use while I continue fine-tuning]

This LoRA is designed to allow the user to create high quality, unique, and extremely beautiful faces and bodies. It is not perfect 100% of the time, but when it does work it excels. If you do not get the face you want right away, do a batch of 10-20.

https://comfy.icu/extension/RamonGuthrie__ComfyUI-RBG-SmartSeedVariance

You can get great results without this, but you should be able to use this node with great success if your hope is to engineer a unique, original and stunning face that has never existed before, without having to adjust the prompt as much. This model will never output 1:1 of any of the subjects in the datasets, but has achieve a convergence on a natural, aesthetic balance of the dataset contents resulting this vibe being locked into the LoRA.

It also works great with this:

https://github.com/shootthesound/comfyUI-Realtime-Lora



*If you have a result that you like in another workflow or vanilla ZiT, or want to give some aesthetic flavor to an existing photo, try turning on the LoRA at a very low weight, and inputting INST4GR4M somewhere in the prompt near the area of the prompt you are describing the subject or the subject's face, to modulate your existing face.

For full generations, you will more than likely need to reduce the weight to at least .70, but it will also work as a style and body/face slider. Most of the anatomy issues I've encountered are a result of tweaking or modifying the block weights using the Realtime LoRA node, or having the weight too high. You should be able to achieve great results with the typical number of steps for ZiT (8-12)

**Nude shots: This LoRA had an extremely limited number of nude shots, and so it is most likely not going to be a true NSFW solution -- having said that though, there is a dedicated portion of images that are strictly from OnlyFans, and so it does have some NSFW capability, but I have not tested it sufficiently to be able to promise that it will perform well for nude shots or things like nipples.

You may start by creating a base image, that is somewhat close to what you want, and then add the LoRA starting at very small weights to introduce some aesthetic and style. The closer you get to around .50, the more it will change the image. Finally, weights closer to .70 + will result in the entire image coming straight from the LoRA, and it acts more like a style LoRA than a slider at that point.

The dataset for this model has been hyper curated for well over a year. It has experienced many iterations, prunes and trainings.

**There will be an updated version of this released soon which is actually a block-balanced approach. These are the full weights, which I was not planning on releasing. After having been tweaking the blocks for some time, and being unable to decide on the best version, I have decided to just release the full weights.

This version is something I am releasing to give the community something they can use, while I work to balance the blocks of the main LoRA.

*I am also going to release the 7k step version, which seems to give a slightly different body aesthetic.

This is a strategy that started with High-Noising for 1 epoch, so it saw every image once with this approach. After the first epoch, I have changed the training back to balanced, and allowed it to run for another several thousand steps.

This is 5000k steps, so 2000k steps with high noise, and then the remaining was balanced.

This LoRA serves to help create new characters, faces and bodies.

You can use the trigger INST4GR4M as an activator, or as a modulator. It does not need to be put at the beginning of every prompt exclusively; while this does work, you can use it anywhere in the prompt to conjure the style (or by placing it strategically next to your descriptions of bodies or poses).

In addition to this trigger word, you also have access to "OnlyFans Style", which you can use as a trigger or modulator.

Finally, you also have the option of using "Instagram Style Selfie" -- the LoRA was trained on mostly Instagram images,and the most timeless, classic selfies were tagged with Instagram Style Selfie.

You can experiment with using very, very short prompts, with just the triggers and modulators.

You can also utilize very long, extensive prompts.

The dataset itself was captioned with a diverse captioning strategy. About 15-20% of the captions are very long and highly descriptive. About 50% of the captions have a moderate caption length, with the remaining 10-25% have very, very short captions. In my experience, this allows the model to become more robust, and allows the user to choose the amount of detail they want to provide in their prompt.

Notes: With higher weights, or manipulating the blocks, we do start to see some anatomy issues.

*Usage Note: If you merge this LoRA with something else, please provide credits on the model release page. We have spent hundreds of dollars on training and countless thousands of hours refining the dataset and improving our training strategy, and the model is completely free.

https://discord.gg/2gQ5eBtRCj - CuteCaption Discord -- post your results, ask questions, tips, etc -- this is also our small community for CuteCaption, which is gearing up for a Summer release: a state-of-the-art captioning and dataset tooling solution for LoRA engineers.

Description

Details

Files

Available On (1 platform)