FP8 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

FP8 LLMs for vLLM

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

FP8 LLMs for vLLM

updated 1 day ago

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM!

neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8

Text Generation • Updated Aug 22 • 938 • 28
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8

Text Generation • Updated Aug 23 • 16.1k • 27
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8

Text Generation • Updated Aug 23 • 51.2k • 26
neuralmagic/Phi-3-medium-128k-instruct-FP8

Text Generation • Updated Aug 12 • 34.4k • 5
neuralmagic/Mistral-Nemo-Instruct-2407-FP8

Text Generation • Updated Jul 19 • 2.39k • 13
neuralmagic/Meta-Llama-3-8B-Instruct-FP8

Text Generation • Updated Jul 18 • 9.57k • 17
neuralmagic/Meta-Llama-3.1-405B-Instruct-FP8-dynamic

Text Generation • Updated Aug 22 • 173 • 13
neuralmagic/Meta-Llama-3-70B-Instruct-FP8

Text Generation • Updated Jul 18 • 2.07k • 10
neuralmagic/Mixtral-8x7B-Instruct-v0.1-FP8

Text Generation • Updated Jul 18 • 942 • 2
neuralmagic/Meta-Llama-3-8B-Instruct-FP8-KV

Text Generation • Updated Jun 19 • 14.6k • 6
neuralmagic/Meta-Llama-3-70B-Instruct-FP8-KV

Text Generation • Updated Jun 26 • 160 • 2
neuralmagic/Mixtral-8x22B-Instruct-v0.1-FP8

Text Generation • Updated Aug 12 • 396
neuralmagic/Qwen2-72B-Instruct-FP8

Text Generation • Updated Jul 18 • 959 • 9
neuralmagic/Qwen2-7B-Instruct-FP8

Text Generation • Updated Jul 18 • 409 • 1
neuralmagic/Qwen2-1.5B-Instruct-FP8

Text Generation • Updated Jul 18 • 82
neuralmagic/Qwen2-0.5B-Instruct-FP8

Text Generation • Updated Jul 18 • 168 • 2
neuralmagic/Mistral-7B-Instruct-v0.3-FP8

Text Generation • Updated Jul 18 • 481 • 2
neuralmagic/Llama-2-7b-chat-hf-FP8

Text Generation • Updated Jul 18 • 266
neuralmagic/Phi-3-mini-128k-instruct-FP8

Text Generation • Updated Aug 12 • 274
neuralmagic/gemma-2-9b-it-FP8

Text Generation • Updated Jul 18 • 526 • 5
neuralmagic/Qwen2-57B-A14B-Instruct-FP8

Text Generation • Updated Jul 18 • 240 • 1
neuralmagic/DeepSeek-Coder-V2-Lite-Instruct-FP8

Text Generation • Updated Jul 18 • 2.2k • 4
neuralmagic/DeepSeek-Coder-V2-Lite-Base-FP8

Text Generation • Updated Jul 18 • 118
neuralmagic/DeepSeek-Coder-V2-Base-FP8

Text Generation • Updated Jul 22 • 18
neuralmagic/DeepSeek-Coder-V2-Instruct-FP8

Text Generation • Updated Jul 22 • 3.88k • 6
neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8-dynamic

Text Generation • Updated Aug 23 • 5.21k • 5
neuralmagic/Meta-Llama-3.1-70B-Instruct-FP8-dynamic

Text Generation • Updated Aug 23 • 1.43k • 2
neuralmagic/Meta-Llama-3.1-8B-FP8

Text Generation • Updated Aug 13 • 1.06k • 5
neuralmagic/Meta-Llama-3.1-70B-FP8

Text Generation • Updated Aug 13 • 168
neuralmagic/starcoder2-15b-FP8

Text Generation • Updated Aug 1 • 56
neuralmagic/starcoder2-3b-FP8

Text Generation • Updated Aug 1 • 12
neuralmagic/starcoder2-7b-FP8

Text Generation • Updated Aug 1 • 6
neuralmagic/Meta-Llama-3.1-405B-FP8

Text Generation • Updated Aug 13 • 7
neuralmagic/gemma-2-2b-it-FP8

Updated Aug 13 • 381 • 1