Collections
Discover the best community collections!
Collections including paper arxiv:2401.17270
-
Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models
Paper • 2404.07973 • Published • 30 -
COCONut: Modernizing COCO Segmentation
Paper • 2404.08639 • Published • 26 -
GLIGEN: Open-Set Grounded Text-to-Image Generation
Paper • 2301.07093 • Published • 3 -
Grounded Language-Image Pre-training
Paper • 2112.03857 • Published • 3
-
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 32 -
Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities
Paper • 2401.14405 • Published • 11 -
Improving fine-grained understanding in image-text pre-training
Paper • 2401.09865 • Published • 15 -
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 26
-
Masked Audio Generation using a Single Non-Autoregressive Transformer
Paper • 2401.04577 • Published • 41 -
YOLO-World: Real-Time Open-Vocabulary Object Detection
Paper • 2401.17270 • Published • 32 -
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation
Paper • 2402.05054 • Published • 25