@singhsidhukuldeep on Hugging Face: ""Hold your pixels" 🚦... SD3 is here 🌟 🚀 Performance Enhancements: Stable…"

Post

879

"Hold your pixels" 🚦... SD3 is here 🌟

🚀 Performance Enhancements: Stable Diffusion 3 surpasses other text-to-image models like DALL·E 3 in typography and prompt adherence.

🏗️ New Architecture: Introduces the Multimodal Diffusion Transformer (MMDiT) that separately processes image and language data, enhancing text understanding and spelling.

⚡ Efficiency Improvements: Features a rectified flow formulation for more efficient image generation, fitting within the memory constraints of common GPUs.

📈 Scalability: Demonstrates scaling capabilities with models ranging up to 8 billion parameters, showing improvements in model performance without saturation.

🔧 Flexible Text Encoders: Offers a flexible approach to text encoding, maintaining performance even when the largest T5 text encoder is removed for less memory-intensive operations.

While they discuss experiments on 2B and 8B parameter models, no word on open weights 🤐

Paper: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2403.03206)
@StabilityAI

Join the conversation