Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
singhsidhukuldeepΒ 
posted an update Jun 3
Post
879
"Hold your pixels" 🚦... SD3 is here 🌟

πŸš€ Performance Enhancements: Stable Diffusion 3 surpasses other text-to-image models like DALLΒ·E 3 in typography and prompt adherence.

πŸ—οΈ New Architecture: Introduces the Multimodal Diffusion Transformer (MMDiT) that separately processes image and language data, enhancing text understanding and spelling.

⚑ Efficiency Improvements: Features a rectified flow formulation for more efficient image generation, fitting within the memory constraints of common GPUs.

πŸ“ˆ Scalability: Demonstrates scaling capabilities with models ranging up to 8 billion parameters, showing improvements in model performance without saturation.

πŸ”§ Flexible Text Encoders: Offers a flexible approach to text encoding, maintaining performance even when the largest T5 text encoder is removed for less memory-intensive operations.

While they discuss experiments on 2B and 8B parameter models, no word on open weights 🀐

Paper: Scaling Rectified Flow Transformers for High-Resolution Image Synthesis (2403.03206)
@StabilityAI