Emma Stone 1.5/2.1 Embeddings - CivArchive (CivitAI Archive)

Emma Stone 1.5/2.1 Embeddings - SD 2.1 - 2 Vectors

While the built in embedding for Emma Stone is not terrible, I was curious on whether I could improve that.

I used 443 sample images, all cropped and tagged manually, mostly chosen from the top 1000 of the posts in her subreddit.

Description

Trained on a 3080Ti with 443 sample images, 2 Vectors, Batch Size 2, 70,000 steps, with a custom learning rate schedule:

5e-4:70, 1e-3:210, 2e-3:350, 3e-3:560, 4e-3:770, 5e-3:1050, 6e-3:1400, 7e-3:2030, 8e-3:3640, 7e-3:4130, 6e-3:4620, 5e-3:5320, 4e-3:6160, 3e-3:7350, 2e-3:9240, 1e-3:11200, 9e-4:11620, 8e-4:12180, 7e-4:12880, 6e-4:13650, 5e-4:14700, 4e-4:16030, 3e-4:17920, 2e-4:21210, 1e-4:24640, 9e-5:25550, 8e-5:26670, 7e-5:27930, 6e-5:29540, 5e-5:31570, 4e-5:34300, 3e-5:38150, 2e-5:43960, 1e-5:48580, 9e-6:49630, 8e-6:50680, 7e-6:51940, 6e-6:53200, 5e-6:54740, 4e-6:56490, 3e-6:58590, 2e-6:61460, 1e-6:63700, 9e-7:64190, 8e-7:64750, 7e-7:65380, 6e-7:66150, 5e-7:66920, 4e-7:67970, 3e-7:69230, 2e-7

The schedule represents a short warm up, followed by a slightly tweaked exponential decay, which focuses on refining details over many steps.

FAQ

Comments (5)

Balthazar99Apr 26, 2023

CivitAI

I don't know what's going on with training, but it's odd to have that many steps and then not have it coming out more accurate than it looks. That's just my take, based on how I know of Emma to look.

My first instinct is that you have an awful lot of training images for only 2 vectors. One of the common threads of information regarding TIs I hear (for which not much is common with TI knowledge), is that when you have more images in your training dataset, that allows you to increase your number of vectors. And having a higher number of vectors allows more flexibility in the accuracy in how well a TI captures someone's likeness.

I'm wondering if with the intense amount of training you did, it tried to train all of that info that's in the large dataset into just a small amount of vectors, which might be working really hard to try and convey the information that was gathered while it was training.

I don't know, I'm not an expert. All I can say is that something looks off.... and your parameters are very unconventional. I do appreciate your work with figuring out the curve and stuff though, that's something I wish I took higher levels of mathematics courses in school to be able to figure out myself better.

Balthazar99Apr 26, 2023

The other thing I could maybe think of... have you tried generating images with the TI with different models aside from the standard SD model? I have heard all the time that the best thing you can do is to train on the base SD models, but when you actually want to generate images after the training is done, switch to a different model, such as one of the popular merges and stuff like Deliberate or Liberty or Chillout Mix or something.

fudefrak

Author

Apr 26, 2023

Yes, you want more images if you're going to use more vectors, but you do not necessarily need more vectors if a low number of vectors gets an accurate enough image. I'm personally not seeing what you think is inaccurate about these images. I think they're pretty spot on.

The problem with more vectors is that while you can get more precise, you can also lose the ability to place the subject in different scenarios, as it will start memorizing details from the locations, outfits, photo styles, etc. Early attempts with a high number of vectors had these kind of issues for me.

I tried with 1 vector and it just wasn't realistic enough to my eye (although SD 1.5 did better with 1 vector imo). 2 vectors looks pretty spot on to my eyes, and I'm a massive fan of Emma. There are little things that can be improved on, and not every render looks perfect, but overall, it looks like a good representation of her likeness to my eyes. So I could certainly do a third vector to see if there was any noticeable improvement, but I stopped here because I thought it looked good as it was.

razzzApr 27, 2023

CivitAI

Don't call your Textual Inversion file "Emma Stone". First because it creates a big bias with the model you're pairing it not resulting on the TI token but the TI+model token and second because it will always call for the TI even when you don't want to. I know it's easy to change but most people won't and their emma stone token will be screwed on all their models.

fudefrak

Author

Apr 28, 2023· 1 reaction

The point of an embedding is to completely replace how the model interprets the words you type, so it SHOULD be used every time you type those words. If you want a different functionality, feel free to rename it yourself. The only time I'd name an embedding to something other than the natural language I'd use to describe that thing, is if I have multiple embeddings of the same subject.

The actual name of the embedding doesn't change how training works. With [filewords] every photo has been named manually by myself, and every photo contains the name Emma Stone in it. When training, it replaces the text "emma stone" with the embedding, and uses the rest of the text to build the prompt around it, and that additional context around "Emma Stone" is useful for it to be able to find the ideal vectors to replace those words with, so that it doesn't start grabbing details from her outfits, or her locations, etc, but focuses on her, because the other words already describe those other details in the image.

So whether you call it Emma Stone or EmmaStoneEmbed or whatever, you'd have to name all of the images to match that in order for the TI process to isolate the correct words, and I don't know about you, but it's just easier to use the real name when naming my images, especially if I plan to use that same dataset in a Dreambooth training for example. So it's best to train based in the natural words, and then if you want to rename the embedding file, you can do so yourself.

An embedding will always be based on one specific model, and work best with that model, but any model that has been trained on the same base will usually at least be able to make use of the embedding and it should still have some similarity. How similar it will look will depend on what weights the embedding activate in the neural network, and how much those weights have or have not been changed during finetuning.

I have actually tried to train embeddings on some of the models I've downloaded from here, but at least on A1111, the training seems to produce completely random results, so A1111 at least seems to be tied to the base models when training embeddings. You can however train a hypernetwork on any model though.

TextualInversion

SD 2.1 768

by fudefrak