Flux Dev
Run the Flux Dev model with limited VRAM in 8bit mode. It's possible, but inpractical, since the downloads alone are "only" 40GB.
Setup
pip install accelerate diffusers optimum-quanto transformers sentencepiece
In int4 mode there are places where the pre-trained weights in fp16 overflow, resulting in a blank image.
Inference
from diffusers import FluxPipeline, FluxTransformer2DModel
from optimum.quanto.models import QuantizedDiffusersModel, QuantizedTransformersModel
import torch
from transformers import T5EncoderModel
class Flux2DModel(QuantizedDiffusersModel):
base_class = FluxTransformer2DModel
class T5Model(QuantizedTransformersModel):
auto_class = T5EncoderModel
if __name__ == '__main__':
T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16) # Duck and tape for Quanto support.
t5 = T5Model.from_pretrained('./flux-t5')._wrapped
transformer = Flux2DModel.from_pretrained('./flux-fp8')._wrapped
pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev',
text_encoder_2=t5,
transformer=transformer)
# This method moves one whole model at a time to the GPU when it's in forward mode.
pipe.enable_model_cpu_offload()
image = pipe('cat playing piano', num_inference_steps=10, output_type='pil').images[0]
image.save('cat.png')
Disclaimer
Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.
- Downloads last month
- 0
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.