So basically, it's a text improvement LoRA, but it also acts as an aesthetics enhancement LoRA (which was unintended, but still welcome). Beta version.
How to use:
There are three main trigger words: "speech bubble", "text", and "snapchat".
While the first two are self-explanatory, "snapchat" adds a Snapchat-like semi-transparent bar with text on it.
The basic pattern is: "words words words", trigger word.
You can also use [n] in some cases to create a speech bubble or a string of text (Unstable!).
Just look at the prompts in the example images :3
Train info:
15 epochs, 2925 steps
Dataset: ~250 imgs, manually captioned text, WD3 Large for tags
Batch size 2, gradient acc 4, keep tags 5, shuffle the rest, no dropout, TE was NOT trained.
Resolutions = [768, 1024, 1280]
Trained on RTX 5060 Ti 16gb for ~16 hrs.
I want to continue working on this LoRA; however, captioning the text by hand is a huge pain in the ass.
Description
First release. Sometimes the text can be worse than w/o LoRA :P
FAQ
Comments (8)
This actually works quite well! It's not perfect, as there are just a tad few spelling errors as well as a word or two missing, but it's a lot better than what the preview can output.
Looking forward to a more refined version later on, keeping my eye on this one!
Thanks for your review! Could you upload some of your generations from this lora? That would be much appreciated :)
is this just meant for speech or can it help with things like text on paper or text on clothing?
For now the Lora was trained on speech only. However you can test if it does what you need to do
Thanks for your hard work. This is a dream lora for me. Love it!
Was it trained on photo images or..?
Nah, no photos. The dataset is a really chaotic mix of manually gathered images from r34 and Danbooru. All of them contain text in some form (mostly speech bubbles, snapchat ones and plain text across the image).
This is genuinely so goated, oh my god


