216 61 184

Victor Sanh PRO

VictorSanh

AI & ML interests

None yet

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 160

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Mar 15

• 5

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 25

Organizations

VictorSanh's activity

upvoted an article 2 months ago

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 244

upvoted an article 3 months ago

Article

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Jun 24

• 168

upvoted a paper 4 months ago

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52

upvoted a collection 4 months ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 133

upvoted a paper 4 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

upvoted a paper 5 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted 2 articles 5 months ago

Article

Red-Teaming Large Language Models

Feb 24, 2023

• 13

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 272

upvoted 6 collections 5 months ago

Zephyr ORPO

Collection

Models and datasets to align LLMs with Odds Ratio Preference Optimisation (ORPO). Recipes here: https://github.com/huggingface/alignment-handbook • 3 items • Updated Apr 12 • 16

Vision Language Models Papers 🖼️💬📝

Collection

Papers about vision-language models, most important ones are on top of the list. • 27 items • Updated Apr 30 • 32

LLM Leaderboard best models ❤️‍🔥

Collection

A daily uploaded list of models with best evaluations on the LLM leaderboard: • 264 items • Updated Jun 22 • 395

DistilBERT release

Collection

Original DistilBERT model, checkpoints obtained from using teacher-student learning from the original BERT checkpoints. • 6 items • Updated Apr 17 • 13

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Aug 2 • 674

🐶 IDEFICS 🐶

Collection

Collection assembling all the models and spaces related to IDEFICS • 6 items • Updated Apr 15 • 7

upvoted an article 5 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 160

upvoted 2 collections 5 months ago

WizardLM

Collection

0 items • Updated Jul 11 • 103

Idefics2 🐶

Collection

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88

upvoted 4 papers 6 months ago

MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training

Paper • 2403.09611 • Published Mar 14 • 123

Chart-based Reasoning: Transferring Capabilities from LLMs to VLMs

Paper • 2403.12596 • Published Mar 19 • 9

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding

Paper • 2403.12895 • Published Mar 19 • 29

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

upvoted a paper 7 months ago

Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset

Paper • 2402.14804 • Published Feb 22 • 2

upvoted a collection 7 months ago

From screenshots to HTML

Collection

WebSight is a dataset of 823,000 HTML/CSS codes representing synthetically generated English websites, each accompanied by a corresponding screenshot. • 4 items • Updated Apr 15 • 17

upvoted 2 papers 7 months ago

Design2Code: How Far Are We From Automating Front-End Engineering?

Paper • 2403.03163 • Published Mar 5 • 93

CoLLaVO: Crayon Large Language and Vision mOdel

Paper • 2402.11248 • Published Feb 17 • 18

upvoted a paper 8 months ago

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9 • 13

upvoted 2 collections 8 months ago

LLaVA-1.6

Collection

A collection of LLaVA-1.6 checkpoints • 4 items • Updated Jan 31 • 64

ZeroGPU Spaces

Collection

ZeroGPU Spaces made by the community • 17 items • Updated Jun 6 • 218

upvoted 2 papers 9 months ago

Gemini vs GPT-4V: A Preliminary Comparison and Combination of Vision-Language Models Through Qualitative Cases

Paper • 2312.15011 • Published Dec 22, 2023 • 15

G-LLaVA: Solving Geometric Problem with Multi-Modal Large Language Model

Paper • 2312.11370 • Published Dec 18, 2023 • 19

upvoted 6 papers 10 months ago

MegaBlocks: Efficient Sparse Training with Mixture-of-Experts

Paper • 2211.15841 • Published Nov 29, 2022 • 7

OneLLM: One Framework to Align All Modalities with Language

Paper • 2312.03700 • Published Dec 6, 2023 • 20

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 182

Exponentially Faster Language Modelling

Paper • 2311.10770 • Published Nov 15, 2023 • 118

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18

Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Paper • 2311.06607 • Published Nov 11, 2023 • 3

upvoted a paper 11 months ago

Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models

Paper • 2311.06783 • Published Nov 12, 2023 • 26

upvoted a collection 11 months ago

Handbook v0.1 models and datasets

Collection

Models and datasets for v0.1 of the alignment handbook • 6 items • Updated Nov 10, 2023 • 24

upvoted 2 papers 11 months ago

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

In-Context Pretraining: Language Modeling Beyond Document Boundaries

Paper • 2310.10638 • Published Oct 16, 2023 • 28

upvoted 2 collections 12 months ago

Text-to-Image Models 🥑

Collection

16 items • Updated Sep 19, 2023 • 3

Image-to-Text Models 📝

Collection

This collection contains image captioning and OCR models. • 15 items • Updated Sep 19, 2023 • 5

upvoted 6 papers 12 months ago

Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images

Paper • 2303.07274 • Published Mar 13, 2023 • 2

Vision Transformers Need Registers

Paper • 2309.16588 • Published Sep 28, 2023 • 77

Efficient Streaming Language Models with Attention Sinks

Paper • 2309.17453 • Published Sep 29, 2023 • 13

Effective Long-Context Scaling of Foundation Models

Paper • 2309.16039 • Published Sep 27, 2023 • 30

AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model

Paper • 2309.16058 • Published Sep 27, 2023 • 55

Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack

Paper • 2309.15807 • Published Sep 27, 2023 • 32

upvoted a collection 12 months ago

Foundation Models for Vision 🧩

Collection

Foundation models for computer vision. • 24 items • Updated Mar 11 • 17

upvoted 11 papers about 1 year ago

ImageBind-LLM: Multi-modality Instruction Tuning

Paper • 2309.03905 • Published Sep 7, 2023 • 16

OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models

Paper • 2308.01390 • Published Aug 2, 2023 • 31

MMBench: Is Your Multi-modal Model an All-around Player?

Paper • 2307.06281 • Published Jul 12, 2023 • 5

Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 239

Retentive Network: A Successor to Transformer for Large Language Models

Paper • 2307.08621 • Published Jul 17, 2023 • 170

M^3IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning

Paper • 2306.04387 • Published Jun 7, 2023 • 8

Flamingo: a Visual Language Model for Few-Shot Learning

Paper • 2204.14198 • Published Apr 29, 2022 • 13

Secrets of RLHF in Large Language Models Part I: PPO

Paper • 2307.04964 • Published Jul 11, 2023 • 27

BloombergGPT: A Large Language Model for Finance

Paper • 2303.17564 • Published Mar 30, 2023 • 19

What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?

Paper • 2307.02469 • Published Jul 5, 2023 • 12

OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents

Paper • 2306.16527 • Published Jun 21, 2023 • 47

Victor Sanh PRO

AI & ML interests

Articles

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

What Makes a Dialog Agent Useful?

Putting ethical principles at the core of research lifecycle

Hugging Face Reads, Feb. 2021 - Long-range Transformers

Simple considerations for simple people building fancy neural networks

Organizations

VictorSanh's activity

SmolLM - blazingly fast and remarkably powerful

Fine-tuning Florence-2 - Microsoft's Cutting-edge Vision Language Models

Red-Teaming Large Language Models

Welcome Llama 3 - Meta's new open LLM

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community