T5xxl Google FLAN from FP32 (CLIP ONLY)

T5xxl Google FLAN from FP32

NOTE: It appears as when the decoder blocks are pruned from the model it looses the ability to interact with FLUX or SD 3.5 - When I merged the models with the full 42GB file this did not happen.

Trimmed Sizes:

FP32 18GB (42GB with Decoder Weights)
FP16/BF16 9GB
FP8 4.5GB (I do not recommend the FP8 version it seems to have lost to much precision)

I created a tool to extract the FLAN T5.

I have a seed to seed image comparison on FLAN T5
Speed increase may be from reduced VRAM Load and or reduced file size decreasing time to load in the case of CPU offloading.
All Models built of the FULL FP32 FLAN model
I have had issues with GGUF quantization using either the T5 header or T5 decoder header, and have been unable to quantize the model using GGUF

Per the Apache 2.0 license FLAN is attributed to Google

T5xxl Google FLAN from FP32

Description

Details

Files

t5xxlGoogleFLANFromFP32_fp16.safetensors

Mirrors