This is a merged model of a base model flux1-dev fp16 and NSFW MASTER FLUX LoRA: https://civarchive.com/models/667086?modelVersionId=746602
To merge, I used a script from:
https://github.com/kohya-ss/sd-scripts/
Using --ratios 0.8 in the merging command.
Description
Merged base model flux1-dev.safetensors (fp16) with NSFW MASTER FLUX LoRA: https://civitai.com/models/667086?modelVersionId=746602
FAQ
Comments (41)
Any chance for a Q8 gguf version to save on VRAM?
i created an fp8 version with weights 0.8 and 0.65 https://civitai.com/models/796670/nsfw-master-flux-fp8-lora-merged-with-flux1-dev-fp16-saved-as-fp8?modelVersionId=895886 they are 11GB
@tedbiv fp8 is way less precise than a Q8 gguf
@psspsspsspssspss script i used only saves fp32, fp16, bf16, fp8
@psspsspsspssspss let me look into it...
Release Unet version. We already have the other files and not so much space on our hard disks.
2 images for 22 GB OMG
Kudos for the attempt. Hopefully there will be more traction to your efforts in the future. While it is applaudable - and somewhat usable using the SD 1x + keyword prompting approach, flux is standardized to using a normal language approach and honestly - by doing so with this model you get the same monstrosities of days past. Honestly , I would rather generate using sd,sdxl, or pony because I get far superior results as opposed to trying to trick this model into believing its flux, but operate like a standard sd or sdxl model. The potential to create a uncensored flux model is certainly there though, if anything you've proven that much.
my poor aching computer... :)
amd5600g, 32GB dram, rtx3060 12GBvram. large model still needs ae, clip_i and t5xxl. took 'a long time' to initially load.1st image 768x1344 20 steps took 3:45. 2nd image took 2:35mins. nice realistic nipples for flux model. pretty female faces. i'll post some images... so far so good
update - my gpu is running about 10c cooler while running this model?
Are you using a1111 maybe? I had crazy load times with it and once I switched to FORGEui it got a lot better. Almost 1/4 of the time to load models. I have a 4060ti with 16gb of vram and was having almost 2 minute load times.
@Triplebenthusiast no, i'm using forge. it was the initial model switch/load took a couple of minutes. after the first image, renders were faster. i really like this model and the images it makes.
the problem is the model size scared everyone off... they need to download it and try it.
update running today, 768x1344 20 steps takes 1:39 min.
must admit... one of the better nsfw flux models i've tried so far. definitely the largest... :)
thank you so much for your effort, but us mere mortals cannot load that into vram
when it's running it takes 24GB dram and 11.3 GB vram, in between images it uses 31GB dram 6GB vram
Say I intend to start training Loras myself soon and I wanted to confirm. Is this a good one to use as a base model?
Can you explain how you merged? Which script did you use, merge_lora.py ?
i tried merge_models.py... had to change the logs to prints. merging flux-dev-fp8 and nsfw-master lora. seemed to run up to 10 minutes into saving file then errored out with memory error. i haven't tried merge lora? my cmd line was 'python merge_models.py --models flux_dev-fp8.safetensors NSFW_master-lora.safetensors --output nsfw-fp8.safetensors --unet_only'
Ah ok. I just found out in the sd3 branch of kohya there is a flux_merge_models.py now maybe that works better. https://github.com/kohya-ss/sd-scripts/tree/sd3/networks
@n0valis thanks, i might give that a try also.
@n0valis i should also try the merge_lora.py.
it looks like i don't have enough computer to run them ...
'cpu' / 'cpu': Uses >50GB of RAM, but works on any machine.
'cuda' / 'cpu': Uses 24GB of VRAM, but requires 30GB of RAM.
'cpu' / 'cuda': Uses 4GB of VRAM, but requires 50GB of RAM, faster than 'cpu' / 'cpu' or 'cuda' / 'cpu'.
'cuda' / 'cuda': Uses 30GB of VRAM, but requires 30GB of RAM, faster than 'cpu' / 'cpu' or 'cuda' / 'cpu'.
i have rtx 3060 w 12GB of vram and only 32GB of dram. must be time to upgrade :(
@tedbiv The Kohya merge script still works really well for merging Flux Loras and especially for merging them into and out of model checkpoints. And those operations would barely draw any resources for anyone who could run this model in the first place. The main headache in this case is having to deal with the weirdly dissimilar and incompatible weight formulations between Loras trained via ai-toolkit/diffusers vs. Kohya-sd-scripts Loras. However, there is a nifty enough conversion script between the two formats in the same folder. Sometimes the conversion excludes some training (mainly text encoder-oriented?), but that actually seems to improve certain Loras, potentially neutralizing some of the context-warping side effects. In any case, it's really not clear to me how and why merging full Unet checkpoints would be whatsoever superior for Flux over strategically modal mixing in/out of training on a LoRa level. When a Lora can amend the transformer attention and feed-forward layers and just about every other component of it, it might as well be a checkpoint.
@A_C_T_soonr ahhh... that would explain why the merge script works on some loras and bails with 'no blocks to modify' on others. i was able to merge flux-dev-fp16 with another lora and save it as fp8 and it's only ~11GB. i'll check out the conversion script. i really like the nsfw content of this model. if i could recreate it as an fp8 more people would use it...? thanks for the help :)
@A_C_T_soonr yay! that allowed the merge script to run... now i'll see if i created something useful :)
@A_C_T_soonr that worked... thx. testing the image now.
Is this all-in-one model or just difusion mode without text encoders?
it needs vae and text encoders
for such fp16 models you need at least RTX 4090 with 24 GB VRAM ... that's obvious > and my RTX 4090 @ 500 Watt TGP @ 2950 MHz core and 24000 MHz VRAM has it ... it can barely handle FLUX.1-dev-fp16 ... but this quality is in 2048x1536 ;)
i run it on rtx 3060 w/12GB vram. takes about 1:15 min for 896x1152 image
Amazing that this is where we are at already after not even 2 full months since initial release. Horny, um, finds a way I guess.
Please consider naming this something else, as it has the exact same name as Shopon_Skp's NSFW Master model. NSFW Defozo Edition or something
Bro, you merged my LoRA model even though I didn’t give permission to merge it! And you didn’t just merge the LoRA, but also used my name as well. Please do one thing: either delete the model or change the name
Hating is wild
on the shoulders of giants
do you want me to rename mine also? i recreated this content in fp8 format. https://civitai.com/models/796670/nsfw-master-flux-fp8-lora-merged-with-flux1-dev-fp16-saved-as-fp8?modelVersionId=895886
thanks for the model, but dang my poor trusty 3090 is suffering and straggling to allocated memory for it
You can convert it to GGUF and quantize it yourself I believe
I would like to use this safe tensors on diffuser pipeline instead of ""black-forest-labs/FLUX.1-dev""
does have any method for connecting this on the pipe ... ??
"""
pipe = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev",
vae=vae,
text_encoder=text_encoder,
tokenizer=tokenizer,
text_encoder_2=text_encoder_2,
tokenizer_2=tokenizer_2,
torch_dtype=dtype,
scheduler=scheduler,
cache_dir=os.environ["HF_HOME"]
)
"""
Would be great to get an fp8 version of that.



