Flux Dev

Run the Flux Dev model with limited VRAM in 8bit mode. It's possible, but inpractical, since the downloads alone are "only" 40GB.

Setup

pip install accelerate diffusers optimum-quanto transformers sentencepiece

In int4 mode there are places where the pre-trained weights in fp16 overflow, resulting in a blank image.

Inference

from diffusers import FluxPipeline, FluxTransformer2DModel
from optimum.quanto.models import QuantizedDiffusersModel, QuantizedTransformersModel
import torch
from transformers import T5EncoderModel

class Flux2DModel(QuantizedDiffusersModel):
    base_class = FluxTransformer2DModel

class T5Model(QuantizedTransformersModel):
    auto_class = T5EncoderModel

if __name__ == '__main__':
    T5EncoderModel.from_config = lambda c: T5EncoderModel(c).to(dtype=torch.float16)  # Duck and tape for Quanto support.
    t5 = T5Model.from_pretrained('./flux-t5')._wrapped
    transformer = Flux2DModel.from_pretrained('./flux-fp8')._wrapped
    pipe = FluxPipeline.from_pretrained('black-forest-labs/FLUX.1-dev',
                                        text_encoder_2=t5,
                                        transformer=transformer)
    # This method moves one whole model at a time to the GPU when it's in forward mode.
    pipe.enable_model_cpu_offload()
    image = pipe('cat playing piano', num_inference_steps=10, output_type='pil').images[0]
    image.save('cat.png')

Disclaimer

Use of this code and the copy of documentation requires citation and attribution to the author via a link to their Hugging Face profile in all resulting work.

Downloads last month
0
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.