(this model is 2509 based)
Lora trained with a bunch of images of sequences where person x is turned into another person or thing y, works with humans to humans, humans to anthro or animals or objects or whatever.
Accepts start, end images or both (and style images but the effectiveness of that is limited)
The style and poses in the generated image are independent of the condition image(s); it's just trained to extract the idea of the condition, so if you for example have an image of someone sitting, prompt them to stand in the sequence and it'll work fine.
The prompts it was trained with roughly align with
Use the end image as the target reference and synthesise the preceding transformation sequence guided by: "transformation sequence man into(image1) woman"
or
transformation sequence (image1) man into (image2) woman, illustration"
framing tags used are either full body framing or portrait framing
style is usually one of illustration, painting, realistic, grayscale ect.
note: the better you describe the image in the prompt, the better sequence it's going to create, i.e if the woman in the end image is wearing a dress, specify that, or it might try and pretend otherwise for the other steps.
this took, a *lot* of synthetic data generation
I have no idea why it always adds speech bubbles; those aren't in the vast majority of the training data.
Description
FAQ
Comments (11)
能真人??
Could this work with just work Qwen's normal text to image? Also if you could could you share the training dataset? Love using this already and want to see how the captions are used in the training dataset.
a) As it is? Potentially, haven't tried it, would probably be relatively simple to retrain with a a hundred or two captioned images so it could if you wanted to, only reason i didn't is because training my full dataset into base qwen image would cost too much.
b) Unfortunately, cannot share, in the interests of not getting crucified by the people who's work is in it.
Use Edit to remove the speech bubbles from the training data.
I literally did, 90% of them had the speech bubbles removed in the training data, regenerated the latents and all, i think it's just undertrained.
Side note: Flux Kontext is much better at it, doesn't change the art like qwen edit does.
Great work! It works fine for me. But i cant figure out how to set a start image. I hope you keep working and bringing future versions.
You have to instead of using
Use the end image as the target reference and synthesise the preceding transformation sequence guided by: "transformation sequence man into(image1) woman"
for example, change it to
Use the start image as the target reference and synthesise the preceding transformation sequence guided by: "transformation sequence man into woman"
I should have written that in the description
@randomusernameblablabla and how do you get a transformation more then just 4 steps? Just make the image size wider?
pretty much, also using "portrait" vs "full body" for the framing, and i sometimes found actually specifying the number of steps worked
How many images/steps did you train this with? looking to train something similar but all my results are garbage...
10,000 images, 2-3 weeks on 8gpus i believe









