🚀 Z-Image AIO Collection
⚡ Base & Turbo • All-in-One • Bilingual Text • Qwen3-4B
⚠️ IMPORTANT: Requires ComfyUI v0.11.0+
✨ What is Z-Image AIO?
Z-Image AIO is an All-in-One repackage of Alibaba Tongyi Lab's 6B parameter image generation models.
Everything integrated:
✅ VAE already built-in
✅ Qwen3-4B Text Encoder integrated
✅ Just download and generate!
🎯 Available Versions
🔥 Z-Image-Turbo-AIO (8 Steps • CFG 1.0)
Ultra-fast generation for production & daily use
⚫ NVFP4-AIO (7.8 GB) 🆕
🎯 ONLY for NVIDIA Blackwell GPUs (RTX 50xx)!
⚡ Maximum speed optimized
💾 Smallest file size
🚀 FP4 precision - blazing fast
Perfect for: RTX 5070, 5080, 5090 owners who want maximum speed
🟡 FP8-AIO (10 GB) ⭐ RECOMMENDED
✅ Best balance of size & quality
✅ Works on 8GB VRAM
✅ Fast downloads
✅ Ideal for most users
Perfect for: Daily use, testing, RTX 3060/4060/4070
🔵 FP16-AIO (20 GB)
💾 Same file size as BF16
🔄 ComfyUI auto-casts to BF16 for compute
⚠️ Does NOT enable FP16 compute mode
📦 Alternative download option
Note: Z-Image does not support FP16 compute - activation values exceed FP16's max range, causing NaN/black images. Weights are cast to BF16 during inference regardless of file format.
Perfect for: Alternative to BF16 download (identical inference behavior)
🌟 BF16-AIO (20 GB) ⭐ RECOMMENDED FOR FULL PRECISION
✅ BFloat16 full precision
✅ Absolute best quality
✅ Professional projects
✅ Also works on 8GB VRAM
Perfect for: Professional work, maximum quality
🎨 Z-Image-Base-AIO (28-50 Steps • CFG 3-5)
Full creative control for pros & LoRA training
🟡 FP8-AIO (10 GB)
✅ Efficient for daily use
✅ Full CFG control
✅ Negative prompts supported
✅ 8GB VRAM compatible
Perfect for: Daily work with full control
🔵 FP16-AIO (20 GB)
💾 Same file size as BF16
🔄 ComfyUI auto-casts to BF16 for compute
⚠️ Does NOT enable FP16 compute mode
📦 Alternative download option
Note: See technical explanation in FAQ below.
Perfect for: Alternative to BF16 download (identical inference behavior)
🌟 BF16-AIO (20 GB) ⭐ RECOMMENDED FOR FULL PRECISION
✅ Maximum quality
✅ Ideal for LoRA training
✅ Professional projects
✅ Highest precision
Perfect for: LoRA training, professional work
🆚 Turbo vs Base - When to Use?
⚡ Use TURBO when:
⚡ Speed is priority → 8 steps = 3-10 seconds
📸 Production workflows → Consistent high quality
💾 Quick iterations → Rapid prototyping
🎯 Simple prompts → Less complex scenes
🎨 Use BASE when:
🎨 Creative exploration → Higher diversity
🔧 LoRA/ControlNet dev → Undistilled foundation
📝 Complex prompting → Full CFG control
🚫 Negative prompts needed → Remove unwanted elements
⚙️ Recommended Settings
⚡ Turbo Settings (incl. NVFP4)
📊 Steps: 8
🎚️ CFG: 1.0 (don't change!)
🎲 Sampler: res_multistep OR euler_ancestral
📈 Scheduler: simple OR beta
📐 Resolution: 1920×1088 (recommended)
🚫 Negative Prompt: ❌ Not used!
🎨 Base Settings
📊 Steps: 28-50
🎚️ CFG: 3.0-5.0 (start with 4.0)
🎲 Sampler: euler ⭐ OR dpmpp_2m
📈 Scheduler: normal ⭐ OR karras
📐 Resolution: 512×512 to 2048×2048
🚫 Negative Prompt: ✅ Fully supported!
📊 Quick Overview
Turbo Versions
⚫ NVFP4 │ 7.8 GB │ RTX 50xx only │ Max Speed 🆕
🟡 FP8 │ 10 GB │ 8GB VRAM │ Recommended ⭐
🔵 FP16 │ 20 GB │ → BF16 compute │ See FAQ ⚠️
🌟 BF16 │ 20 GB │ 8GB VRAM │ Max Quality ⭐
Base Versions
🟡 FP8 │ 10 GB │ 8GB VRAM │ Efficient
🔵 FP16 │ 20 GB │ → BF16 compute │ See FAQ ⚠️
🌟 BF16 │ 20 GB │ 8GB VRAM │ LoRA Training ⭐
💡 Prompting Guide
✅ Good Example:
Professional food photography of artisan breakfast plate.
Golden poached eggs on sourdough toast, crispy bacon, fresh
avocado slices. Morning sunlight creating warm glow. Shallow
depth of field, magazine-quality presentation.
❌ Bad Example:
breakfast, eggs, bacon, toast, food, morning, plate
📝 Tips
DO:
✅ Use natural language
✅ Be detailed (100-300 words)
✅ Describe lighting & mood
✅ Specify camera angle
✅ English OR Chinese (or both!)
DON'T:
❌ Tag-style prompts (tag1, tag2, tag3)
❌ Very short prompts (under 50 words)
❌ Negative prompts with Turbo
🌐 Bilingual Text Rendering
English:
Neon sign reading "OPEN 24/7" in bright blue letters
above entrance. Modern sans-serif font, glowing effect.
中文:
Traditional tea house entrance with sign reading
"古韵茶坊" in elegant gold Chinese calligraphy.
Both:
Modern cafe with bilingual sign. "Morning Brew" in
white script above, "晨曦咖啡" in Chinese below.
📥 Installation
Step 1: Download
Choose your version based on:
GPU: RTX 50xx → NVFP4 possible
VRAM: 8GB → FP8 recommended
Purpose: LoRA Training → Base BF16
Step 2: Place File
ComfyUI/models/checkpoints/
└── Z-Image-Turbo-FP8-AIO.safetensors
Step 3: Load & Generate
Open ComfyUI (v0.11.0+!)
Use "Load Checkpoint" node
Select your AIO version
Generate!
No separate VAE or Text Encoder needed!
🙏 Credits
Original Model
👨💻 Developer: Tongyi Lab (Alibaba Group)
🏗️ Architecture: Single-Stream DiT (6B parameters)
📜 License: Apache 2.0
Links
🔗 Z-Image Base: https://huggingface.co/Tongyi-MAI/Z-Image
🔗 Z-Image Turbo: https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
🧠 Text Encoder: https://huggingface.co/Qwen/Qwen3-4B
📈 Version History
v2.2 - FP16 Clarification
📝 Updated FP16 descriptions for technical accuracy
⚠️ Clarified: FP16 weights ≠ FP16 compute
🔄 FP16 files are cast to BF16 during inference
v2.1 - NVFP4 Release 🆕
➕ Z-Image-Turbo-NVFP4-AIO (7.8GB)
⚡ Optimized for NVIDIA Blackwell (RTX 50xx)
🚀 Maximum speed generation
v2.0 - Base AIO Release
➕ Z-Image-Base-BF16-AIO
➕ Z-Image-Base-FP16-AIO
➕ Z-Image-Base-FP8-AIO
🔄 ComfyUI v0.11.0+ support
📝 Qwen3-4B Text Encoder
v1.1 - FP16 Added
➕ Z-Image-Turbo-FP16-AIO
🔧 Wider GPU compatibility
v1.0 - Initial Release
✅ Z-Image-Turbo-FP8-AIO
✅ Z-Image-Turbo-BF16-AIO
✅ Integrated VAE + Text Encoder
❓ FAQ
Q: Which version should I choose?
RTX 50xx + Speed → NVFP4 🆕
Most users → Turbo FP8 ⭐
Full precision → BF16 ⭐
LoRA Training → Base BF16
Q: Turbo or Base?
Fast & simple → Turbo ⚡
Full control → Base 🎨
Q: Will NVFP4 work on my RTX 4090?
❌ No! NVFP4 is only for RTX 50xx (Blackwell architecture).
Use FP8 instead for RTX 40xx and older.
Q: Do I need separate VAE/Text Encoder?
❌ No! Everything is already integrated.
Just Load Checkpoint and go!
Q: Works on 8GB VRAM?
✅ Yes! All versions work on 8GB VRAM.
(NVFP4 requires RTX 50xx regardless of VRAM)
⚠️ Q: What about FP16 for older GPUs (RTX 2000/3000)?
Important technical clarification:
Z-Image does NOT support FP16 compute type. Here's why:
📊 Technical reason:
- FP16 max value: ~65,504
- BF16 max value: ~3.39e+38 (same as FP32)
- Z-Image's activation values exceed FP16's range
- Result: Overflow → NaN → Black images
What actually happens:
ComfyUI automatically casts weights to BF16 for computation
You can see this in logs: "model weight dtype X, manual cast: torch.bfloat16"
"Weight dtype" (file format) ≠ "Compute dtype" (actual calculation)
For RTX 20xx users (no native BF16):
BF16 is emulated via FP32 = slower but works
There is no way to run Z-Image in true FP16 compute
FP8 with CPU offload may be a better option for limited VRAM
TL;DR: FP16 and BF16 files behave identically during inference. Choose based on download preference, not GPU compatibility.
🚀 Get Started Now!
Download → Load Checkpoint → Generate!
Recommended versions:
🟡 FP8 for most users (best size/quality balance)
🌟 BF16 for maximum quality
⚫ NVFP4 for RTX 50xx speed
All versions work on 8GB VRAM
Happy generating! 🎨
Description
Z-Image-Base-AIO-BF16
FAQ
Comments (38)
真鸡巴快!
My wife always says that too 😂
z_Image Base thingy
Fantastic model so far! I could test it, but from my testing, the generation time for a single picture is just awfully slow. My 12 GB GPU needs approximately 3:20 per generation; that’s even worse than Qwen Edit or Gwen Image on my GPU. So, I will not go further into it and will have a look to see if this is going to stay that way in the future. For now, it’s back to Turbo and Qwen for me. Thanks, @SeeSeeLP ! Good job as always.
Qwen Image / Qwen Image Edit / zTurbo are 1 Minute gens or below.
some generations with AIOFP16 below:
Update FP8 Base almost same gen times on my GPU 3:18~3:20
Thanks a lot for testing and for the detailed feedback! That lines up pretty well with the results I’ve seen from my own testing and from what others have reported too. The Base versions are noticeably slower, especially compared to Turbo or Qwen-based models, so your generation times definitely don’t sound unusual.
Totally understandable to stick with Turbo or Qwen for now if speed is the priority. I’ll keep an eye on this and see if there’s room for optimization in future updates. Thanks again for taking the time to test it, and I’m glad you’re enjoying the models overall!
@SeeSeeLP could drop it to 2:40 ish but the Details drops significant also, tested it with a Quant 4 K_M also.
So i will be a Long Term User of the z_Turbo FP16 AIO of yours congrats 😂👍
edit: Typos
@cynic2010 Yeah, I’m totally fine with that 😂
I actually check the galleries of the different checkpoints almost every day and keep seeing your awesome images — keep it up! 👌
@SeeSeeLP oh maybe i will check on the BF16 Branch i have to test 😂 there is also space to fill right 🌞😂
Getting weirdly poor results on AMD GPU and FP8
Thanks for your feedback! Regarding the issue, it could be related to the workflow or certain settings. Overall, I’m also not fully satisfied with the FP8 version of Z-Image-Base-AIO myself. It’s possible that this variant could be improved by using a different approach instead of the same method used to convert it from BF16 to FP8 in the original release.
I’ll definitely take another look at this and see what can be improved.
if you save the "checkpoint" using comfyui built-in node, you save the model in pure fp8. Which is not quantized, and mathematically, worse than gguf q4, but with the size of q8.
Very messy poster. Z-imageTurboBase is oxymoron. Previous checkpont was release broken and provided solution on how to edit ComfyUI code was not accurate. Workflow constantly comes with long negative prompt while CFG is =1 and says "Not to change" ( where actually in Base model you do use negative prompt back again as it's not distilled)
I honestly don’t really understand this comment, so let me clarify a few things.
First of all, the title clearly states Z-Image-Turbo/Base-AIO. There are three Base versions (bf16, fp16, fp8) and the same three again as Turbo versions — so six variants in total. I don’t see anything chaotic about that.
Second, I have never released a “broken” checkpoint. That claim is simply incorrect, and I strongly reject it. Calling a release broken without evidence is unfair.
Third, I never provided or suggested editing ComfyUI code to load Z-Image checkpoints. The only thing I mentioned was updating ComfyUI (v0.11+) to properly support the Base versions. That’s it.
And lastly, the claim about the workflow constantly adding a long negative prompt while CFG is set to 1 is just another unsubstantiated statement. This is not expected behavior and hasn’t been an issue for other users.
If there’s an actual reproducible issue, I’m always open to constructive feedback — but these accusations don’t reflect how the release actually works.
@SeeSeeLP don't get bothered. 😉🌞
@iGor777999 Спасибо, приятель, я не знаю твой язык, так что не обижайся, если я написал какую-то ерунду :D
Но ты прав. ❤️
@cynic2010 ТЫ ПРАВА
@cynic2010 ВСЕ ПРАВИЛЬНО НАПИСАЛ
@SeeSeeLP The broken checkpoint was for Flux. It was released for ComfyUI broken. And the code editing instructions was to replace 2 lines where actually all paragraph should be replaced. Same for negative prompt - anyone can open the workflow and see it. Just make a choice - remove all that "blurry, out of focus, overexposed, underexposed...." etc. Or, remove that note "DO NOT change CFG 1" that is misleading. The file name "zImageTurboBaseAIO_zImageBaseAIOBF16.safetensors" IS confusing. Ok, ok - you right about everywhere, I'm wrong, no problem. It's just my comment, my opinion.
@mishash Let me clear this up properly, because a few things are getting mixed together here.
First: you’re referring to a different model — Flux.2-4B-Distilled-AIO, not the Z-Image checkpoint this thread is about. So yes, it looks like this comment was originally posted under the wrong checkpoint.
Second: the Flux AIO models were never broken checkpoints. The issue was a ComfyUI limitation, which I clearly documented in detail at the time. ComfyUI simply did not support loading text encoders from Flux2 AIO checkpoints until that part of the code was implemented. That is not a model issue, and calling the checkpoint “broken” is incorrect.
Third: the code change you’re referring to was explained in the context of ComfyUI’s missing implementation. The reason it involved replacing more than two lines is because the original method was literally a TODO. This was also confirmed in the GitHub issue I linked. Again: ComfyUI issue, not a broken release.
Regarding CFG: this is a Distilled model. For distilled versions, CFG = 1 is expected and correct behavior. That note does not apply to the Base version (which I have not released as an AIO yet). This distinction is clearly stated, but it seems that part was missed.
As for the filename: that name is generated by Civitai, not by me. I did not manually name it that way, and it was never part of the discussion in your original comment.
You’re absolutely entitled to your opinion — no problem there. But several of the points raised here are based on misunderstandings, mixing up different models, or skipping important context.
Constructive feedback is always welcome. Claims about “broken checkpoints” are not.
Dear SeeSeeLP,
I said your publications are a mess—and I stand by that. What “wrong section”? This is my opinion, and it has nothing to do with sections.
I commented after wasting a couple of hours on the Flux model that is supposedly “not broken.” Yes, it’s not broken—it just doesn’t work. Then I opened this post and found the same kind of mess.
You say, “Regarding CFG: this is a distilled model.”
But the workflow you post is an image: a cat, with the title “z-image-base-aio-bf16.”
What distilled model? What are you even referring to?
You also say, “That note does not apply to the Base version.”
Open the image you posted and look at it yourself—it clearly contradicts that statement.
You’re not arguing opinions here; you’re arguing facts. So yes, we’re both entitled to our opinions:
me, because I’m pointing out what’s actually there;
you, because you’re trying to reinterpret those facts to defend your position.
You should consider working for CNN—they pay for that skill.
Best regards back to you, @mishash 👋
First of all, I hope you had a good day. I’ve honestly been checking all day to see if you replied — you can see, I was thinking about you the whole time 😂👍
And I think I’m slowly starting to understand where your confusion comes from.
I believe you’re mixing apples and oranges here and don’t fully see how the platform works or how the models are structured. No worries though — that’s what I’m here for 🙂
Let me try to lay this out clearly (not meant in a bad way).
🧩 How I work with AIO models
As you can see across my uploads, I really like creating AIO versions (All-in-One).
For every model I release, I also create matching workflows, so users don’t have to build their own and everything works out of the box.
I always generate the model card images using my own models and my own workflows to make sure both actually work together — for example this image:
👉 https://civitai.com/images/118282429
On top of that, I read all comments and private messages carefully to decide whether a v2 or v3 makes sense and to fix real issues if they appear.
📦 Some of my released models
Chroma-Anime-AIO FP8
https://civitai.com/models/2022057/chroma-anime-aio
Qwen-Anime-AIO FP8 (based on Qwen Image Edit)
https://civitai.com/models/2122738?modelVersionId=2288507
Z-Image-Turbo-Anime-AIO
FP16: https://civitai.com/models/2259646?modelVersionId=2550879
FP8: https://civitai.com/models/2259646?modelVersionId=2544019
BF16: https://civitai.com/models/2259646?modelVersionId=2543657
Z-Image-Turbo-AIO
FP8: https://civitai.com/models/2173571?modelVersionId=2448013
FP16: https://civitai.com/models/2173571?modelVersionId=2550362
BF16: https://civitai.com/models/2173571?modelVersionId=2447693
Flux.2-klein-AIO
https://civitai.com/models/2327389?modelVersionId=2618128
Z-Image-Base-AIO
FP8: https://civitai.com/models/2173571?modelVersionId=2637423
FP16: https://civitai.com/models/2173571?modelVersionId=2638374
BF16: https://civitai.com/models/2173571?modelVersionId=2638695
🔧 Matching workflows (important)
Chroma-Anime-AIO Workflow
https://civitai.com/models/2027641/chroma-anime-aio-simple-workflow
Qwen-Anime Official Workflow
https://civitai.com/models/2135240?modelVersionId=2540517
Z-Image Turbo / Base Workflow
https://civitai.com/models/2174008?modelVersionId=2638927
Flux.2-klein Workflows
https://civitai.com/models/2327746?modelVersionId=2618497
❗ About the “cat image” and your confusion
You were using the wrong workflow.
The image with the cat is part of the Z-Image workflows (ZIB = Z-Image-Base, ZIT = Z-Image-Turbo).
You mentioned Flux2-klein-4B being broken — and we already established that it wasn’t.
That Flux model is a distilled version (even if the name doesn’t explicitly say so — I didn’t invent that 😊).
That’s why it runs with:
CFG = 1
very few steps (≈4)
There also exists a Flux2-klein-Base-4B, where CFG needs to be higher (≈4) and steps around 50 — but I did not release that one.
Correct downloads for Flux2-klein-4B:
Model: https://civitai.com/models/2327389?modelVersionId=2618128
Workflow: https://civitai.com/models/2327746?modelVersionId=2618497
🎯 Z-Image versions explained (this is key)
There are four Z-Image variants planned, two of which are released so far:
🚀 Z-Image-Turbo (distilled)
CFG = 1
Steps ≈ 9
Optimized for speed
Workflows (all marked ZIT, meaning Turbo):
ZIT-AIO-v1.0
ZIT-AIO-v2.0
ZIT-AIO-Control
ZIT-AIO-Variance
ZIT-AIO-SeedVR2
ZIT-AIO-DepthV3
Anime version:
👉 https://civitai.com/models/2174008?modelVersionId=2544130
🧱 Z-Image-Base (non-distilled)
New workflow:
👉 https://civitai.com/models/2174008?modelVersionId=2638927
Recommended settings (clearly stated there):
CFG: 3.0–5.0 (default 4.0)
Steps: 28–50
Sampler: Euler / DPM++ 2M
Negative prompts: fully supported
There is no CFG=1 note for the Base workflow.
🧠 Final clarification
Turbo = distilled (low CFG, few steps)
Base = non-distilled (higher CFG, more steps)
Flux2-klein without “Base” in the name = distilled
Z-Image-Turbo = distilled
Z-Image-Base = not distilled
I honestly think you just used the wrong workflow and got frustrated because of that.
If you notice anything else or have real questions — feel free to reach out again.
Have fun generating, and enjoy the models ✨
The image with the cat (z-image-base-aio-bf16) actually still shows the information from the Z-Image Turbo model.
"📦 Model Info - Z-Image-Turbo-AIO"
it's also written there in bold ^^
On this point, I have to agree with you that the information in this image, which isn't the workflow for ZIB-AIO, is still for the Turbo version, since I hadn't quite finished that workflow for the Base version.
But as I said, this isn't a workflow, just an image, which is why you should use this one:
Не надо спорить в COMFY сам черт ногу сломит, у них текстовый энкодер написан через одно место, поэтому постоянные правки и новые глюки....
@mishash 何意味?你是傻叉吗?
Where do you get off acting so high and mighty? And why do so many people in the comments shamelessly demand things without an ounce of guilt? Do you even know how to offer feedback with a decent attitude? Are you actually human? Or you're a robot or something else? The nauseating stench of your arrogance is practically leaking through the screen. I bet nobody likes you in real life—that’s why you’re seeking validation for your pathetic existence here....
What is the point of this ?
The main point is convenience 🙂
An AIO (All-in-One) checkpoint bundles everything into a single file, similar to how SDXL, Illustrious, or Pony work. You don’t have to manage separate components or extra setup — it just loads and runs.
Having multiple variants also helps different setups:
BF16 → original quality
FP16 → better compatibility with older GPUs
FP8 → much lower VRAM usage
Overall, AIO versions are easier to use, more portable, faster to set up, and reduce configuration errors. If you just want to load a checkpoint and generate without extra hassle, that’s the advantage.
Thank you all so much! ❤️
I just noticed that we’ve passed 10K+ downloads on the AIO checkpoints, and I really wanted to take a moment to say thank you. I go through the galleries every day and honestly love what I see — from very simple images to extremely complex ones, from SFW to NSFW, everything in between. They’re all great.
Seeing your prompts, ideas, and results constantly inspires me as well. It often gives me new ideas for images, workflows, and even potential training data for future checkpoints. That kind of creative exchange is honestly amazing.
You’re all awesome — seriously. Thanks a lot for the support, the feedback, and the creativity. This community is 🔥✨
"FP16-AIO (~20GB) - Wide GPU compatibility (RTX 2000/3000 series)"
This is actually a placebo and is misleading.
Z-image does not support fp16, only bf16. This is the model setting, hardcoded in comfyui. Can't be changed. It does not matter what weight type you have. All weights will be converted to bf16 when loading.
Some users asked me about your model. I just copy my answers here.
Don't trust what AI saying.
"They are important for GPUs without proper BF16 support (RTX 20xx, some older setups).
They offer better compatibility with different Torch / CUDA stacks.
They provide predictable loading behavior and lower friction for users."
Those are 100% nonsense.
@reakaakasky I’d really appreciate it if you could share one or two sources related to that claim, or point me to the relevant code sections in ComfyUI where this behavior is defined. I’ve tried to look into it myself but couldn’t find anything that clearly states FP16 is always cast to BF16.
Feel free to send it via PM as well if that’s easier 👍
Context: some guys have old gpu only support fp16, and if they force the comfyui running the model in fp16 they will get black images. They don't want to run the model in fp32 because it is extremely slow. They found your fp16 model, but still got black images. So they asked in my server.
Full answer is to complicated. Short answer is, "model weight type" is not "compute type".
Sorry, I'm not here to argue, I just want to point out that you mentioned "Native FP16 support on almost all GPUs". Which many people think it means they can run the model in fp16 "compute type".
z-image does not support fp16 "compute type" because the activation values are bigger than fp16 maximum range, and will overflow to nan in fp16 compute mode. Not because the weights is not fp16.
Model weights will always be loaded in the "compute type". No matter what kind of storage type the model file uses, safetensors, fp8, gguf q8, etc.
@reakaakasky Thanks for pushing back on this – I did some deeper research and you're right.
I found the actual GitHub issue for Z-Image (#14) which confirms that FP16 inference produces black images due to NaN values from activation overflow. The key distinction you made between "weight type" and "compute type" is exactly what I was missing.
The technical reality:
FP16 max value is ~65,504, while BF16 goes up to 3.39e+38
Diffusion transformers like Flux/Z-Image have progressively increasing activation scales that exceed FP16's range
ComfyUI automatically casts to BF16 for computation regardless of weight dtype (as shown in the logs: model weight dtype X, manual cast: torch.bfloat16)
So yeah, my description "Native FP16 support on almost all GPUs" is misleading – it suggests users can run the model in FP16 compute mode, which they can't. The FP16 weights get cast to BF16 anyway during inference.
I'll update the description to clarify this. Appreciate you taking the time to explain it 👍








