Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published about 20 hours ago • 1 • 1
EuroLLM: Multilingual Language Models for Europe Paper • 2409.16235 • Published about 16 hours ago • 6 • 1
MaskBit: Embedding-free Image Generation via Bit Tokens Paper • 2409.16211 • Published about 16 hours ago • 4 • 1
MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling Paper • 2409.16160 • Published about 17 hours ago • 5 • 1
Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation Paper • 2409.16283 • Published about 14 hours ago • 4 • 1
MaterialFusion: Enhancing Inverse Rendering with Material Diffusion Priors Paper • 2409.15273 • Published 1 day ago • 7 • 2
Phantom of Latent for Large Language and Vision Models Paper • 2409.14713 • Published 2 days ago • 20 • 2
MaskedMimic: Unified Physics-Based Character Control Through Masked Motion Inpainting Paper • 2409.14393 • Published 3 days ago • 6 • 2
Self-Supervised Audio-Visual Soundscape Stylization Paper • 2409.14340 • Published 3 days ago • 1 • 2
Prithvi WxC: Foundation Model for Weather and Climate Paper • 2409.13598 • Published 5 days ago • 25 • 2
Imagine yourself: Tuning-Free Personalized Image Generation Paper • 2409.13346 • Published 5 days ago • 57 • 5
Colorful Diffuse Intrinsic Image Decomposition in the Wild Paper • 2409.13690 • Published 5 days ago • 10 • 3
V^3: Viewing Volumetric Videos on Mobiles via Streamable 2D Dynamic Gaussians Paper • 2409.13648 • Published 5 days ago • 8 • 2
Portrait Video Editing Empowered by Multimodal Generative Priors Paper • 2409.13591 • Published 5 days ago • 12 • 2
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published 6 days ago • 107 • 9
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution Paper • 2409.12961 • Published 6 days ago • 22 • 2
LVCD: Reference-based Lineart Video Colorization with Diffusion Models Paper • 2409.12960 • Published 6 days ago • 20 • 5
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion Paper • 2409.12957 • Published 6 days ago • 16 • 2
3DGS-LM: Faster Gaussian-Splatting Optimization with Levenberg-Marquardt Paper • 2409.12892 • Published 6 days ago • 5 • 2
StoryMaker: Towards Holistic Consistent Characters in Text-to-image Generation Paper • 2409.12576 • Published 6 days ago • 14 • 2
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published 6 days ago • 44 • 4
FlexiTex: Enhancing Texture Generation with Visual Guidance Paper • 2409.12431 • Published 6 days ago • 9 • 3
Denoising Reuse: Exploiting Inter-frame Motion Consistency for Efficient Video Latent Generation Paper • 2409.12532 • Published 6 days ago • 5 • 2
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning Paper • 2409.12183 • Published 7 days ago • 28 • 3
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published 7 days ago • 63 • 2
Takin: A Cohort of Superior Quality Zero-shot Speech Generation Models Paper • 2409.12139 • Published 7 days ago • 11 • 4
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think Paper • 2409.11355 • Published 8 days ago • 25 • 2
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer Paper • 2409.10819 • Published 8 days ago • 15 • 3
Phidias: A Generative Model for Creating 3D Content from Text, Image, and 3D Conditions with Reference-Augmented Diffusion Paper • 2409.11406 • Published 8 days ago • 22 • 2
SplatFields: Neural Gaussian Splats for Sparse 3D and 4D Reconstruction Paper • 2409.11211 • Published 8 days ago • 7 • 2
Agile Continuous Jumping in Discontinuous Terrains Paper • 2409.10923 • Published 8 days ago • 10 • 2
OSV: One Step is Enough for High-Quality Image to Video Generation Paper • 2409.11367 • Published 8 days ago • 12 • 2
jina-embeddings-v3: Multilingual Embeddings With Task LoRA Paper • 2409.10173 • Published 9 days ago • 20 • 2
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published 11 days ago • 44 • 2
A Diffusion Approach to Radiance Field Relighting using Multi-Illumination Synthesis Paper • 2409.08947 • Published 12 days ago • 11 • 2
InstantDrag: Improving Interactivity in Drag-based Image Editing Paper • 2409.08857 • Published 12 days ago • 29 • 2
DrawingSpinUp: 3D Animation from Single Character Drawings Paper • 2409.08615 • Published 12 days ago • 14 • 2
Apollo: Band-sequence Modeling for High-Quality Audio Restoration Paper • 2409.08514 • Published 12 days ago • 8 • 2
Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection Paper • 2409.08513 • Published 12 days ago • 10 • 2
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published 13 days ago • 9 • 4