Emiru - HiDream - CivArchive (CivitAI Archive)

This is a LoRA of the twitch streamer Emiru for HiDream-I1-Full.

Trigger word: “Emiru”

Suggested LoRA weight: 1.0 - 1.7

The model is trained at 512, 768 and 1024 resolutions.

The training images also contained many Cosplay images. Most of the characters weren’t tagged by me; the few I did tag include: Jinx, Ahri, Kiriko, Yor Forger, Albedo, Raphtalia, Tifa, Makima. But due to the low amount you shouldn’t really expect any good recreations.

I mostly used the word “casual” in the tagging when the image was not a cosplay. Also being screencaps from streams or similar, there are quite a few low-quality images in the training data. They were tagged with tags along the lines of: “Smartphone quality”, “Bad image quality”, “Blurry”, “Low quality image”, … . So putting some of them in the negatives helps should the generated images seems low quality / blurry. However they have a negative effect on the LoRA overall, so if I would retrain it, I would remove those images.

I feel like the LoRA is just okay. It is quite flexible thanks to HiDream, but in my opinion it needs some wrangling when you generate images too far away from the training images. And with some wrangling I mean a longer elaborate prompt, so it doesn’t make anything up. For example, the training data had no paintings or drawings or similar. However, when I generated the watercolour images, I had to put “emir” and “Arabic" as negative, as for some reason it generated monarchs (even male ones) and mostly had a very Arabic theme (when I only wanted a neutral image in this case).

Images were exclusively generated in ComfyUI.

Training

As always, I will add a little bit about the training.

HiDream seemed quite interesting, due to the strong text encoder (Llama) and the full variant being available for training. My hope was that it is a very strong model also for multi-concept trainings and larger finetunes.

For that reason, I was eager to do my common Belle Delphine training as first test run. However, due to the size of that dataset I always prefer doing the training of that at home. And there is the first problem as of right now: Very few trainers support HiDream currently, and the one I normally use (Onetrainer) is not among them yet, and the other one I use (Ostris ai-toolkit) requires at least 48GB of VRAM. Which I do not have at home.

After checking available frameworks, I decided to use diffusion-pipe as it supports block swapping. I spend a few hours then getting it to run on my RTX 5090 (due to the requirement of Pytorch 2.7, CUDA 12.8 and the following need of compiling libraries such as flash-attn locally). After finishing that, I started a test run on the Belle dataset for ~24 hours. It had some results, but I wasn’t happy with them, and I realized I probably need to change the parameters as the ones I tried didn’t really work out.

This Emiru LoRA is a result of that, as I just quickly grabbed a much smaller dataset (345 images) with the hope that I can then test some parameters and have a quicker training (as it is also a simpler dataset). I then captioned the images using InternVL3-14B and a manual pass adding some information to the captions.

Previous experiments I did had an effective batch size of 4, and mostly used the recommended settings by diffusion-pipe. However, like I said I wasn’t happy with the results. So, I decided to change the optimizer to prodigy, and using only a batch size of 1. If this training run wouldn’t work, I would probably not use HiDream anymore, as for my need’s other models (at least for LoRAs) are better then.

So, I ended up with the following main settings for this test-run:

micro_batch_size_per_gpu = 1
gradient_accumulation_steps = 1
warmup_steps = 0
blocks_to_swap = 6
llama3_4bit = true
transformer_dtype = 'float8'
flux_shift = true
rank = 32
type = 'Prodigy'
lr = 1

I let this run for about 18 hours. This corresponds to about 26500 steps or about 25 epochs (as 3 resolutions -> 3x 345 images). I then tested it and it was okay, but I felt like likeness and details could still be better. So, I continued with the saved state (side note: you can create a file called save_quit in the output folder and it will stop and save a checkpoint, so you don’t need to wait till it finishes the save_every_... time) for something like 20 more hours. This was then around epoch 56. I retested it and compared different epochs (16, 32, 40, 48, 56) and felt like epoch 56 has the best likeness, but it overfitted on the training data. For me the overfitting was too large, so I decided to create an average of 8 different LoRA versions (so different steps / epochs). The chosen checkpoints were mostly evenly distributed, with a larger density towards the end. I retested that merge and decided that the resemblance is still good, while being less overfitted.

The rank 32 LoRAs were about 580MB, which I felt like was too large, so I then adjusted the safetensors file so it would be compatible with Kohya’s sd-scripts and resized it to rank 32, with a sv_fro of 93. This reduced the final size to about 180MB.

This results in a total training time of about 39 hours on an RTX 5090.

I feel this is quite long, although I could probably have stopped like 22 hours in. Another thing to consider is that I used Prodigy. Which will produce some results for sure, however you could configure it more optimally to be much more efficient.

Overall, I think it is quite expensive (both time and required VRAM) to train the LoRA models. I will probably wait for Onetrainer or ai-toolkit with the new automagic optimizer before continuing any training runs.

Should you have any additional questions, feel free to ask them. However I log in extremely rarely to Civitai, so it might take quite a bit of time until I will answer you. (Unless you asked shortly after I published this LoRA).

Disclaimer

I want to highlight again that this model is non-commercial, and you should only post images on CivitAI which follow the Content Rules.

Users are solely responsible for the content they generate using this LoRA. It is the user’s responsibility to ensure that their usage of this model adheres to all applicable local, state, national and international laws. I do not endorse any user-generated content and expressly disclaim any and all liability in connection with user generations.

Training

Disclaimer

Description

Details

Files

DI_Emiru_HiDream.safetensors

Mirrors