--- license: llama3 tags: - llama - llama-3 - meta - facebook - gguf --- Directly converted and quantized into GGUF based on `llama.cpp` (release tag: b2843) from the 'Mata-Llama-3' repo from Meta on Hugging Face. Including the original LLaMA 3 models file cloning from the Meta HF repo. (https://huggingface.co./meta-llama/Meta-Llama-3-70B) If you have issues downloading the models from Meta or converting models for `llama.cpp`, feel free to download this one! ### How to use the `gguf-split` / Model sharding demo : https://github.com/ggerganov/llama.cpp/discussions/6404 ## Perplexity table on LLaMA 3 70B Less perplexity is better. (credit to: [dranger003](https://github.com/ggerganov/llama.cpp/pull/6745#issuecomment-2093892514)) | Quantization | Size (GiB) | Perplexity (wiki.test) | Delta (FP16)| |--------------|------------|------------------------|-------------| | IQ1_S | 14.29 | 9.8655 +/- 0.0625 | 248.51% | | IQ1_M | 15.60 | 8.5193 +/- 0.0530 | 201.94% | | IQ2_XXS | 17.79 | 6.6705 +/- 0.0405 | 135.64% | | IQ2_XS | 19.69 | 5.7486 +/- 0.0345 | 103.07% | | IQ2_S | 20.71 | 5.5215 +/- 0.0318 | 95.05% | | Q2_K_S | 22.79 | 5.4334 +/- 0.0325 | 91.94% | | IQ2_M | 22.46 | 4.8959 +/- 0.0276 | 72.35% | | Q2_K | 24.56 | 4.7763 +/- 0.0274 | 68.73% | | IQ3_XXS | 25.58 | 3.9671 +/- 0.0211 | 40.14% | | IQ3_XS | 27.29 | 3.7210 +/- 0.0191 | 31.45% | | Q3_K_S | 28.79 | 3.6502 +/- 0.0192 | 28.95% | | IQ3_S | 28.79 | 3.4698 +/- 0.0174 | 22.57% | | IQ3_M | 29.74 | 3.4402 +/- 0.0171 | 21.53% | | Q3_K_M | 31.91 | 3.3617 +/- 0.0172 | 18.75% | | Q3_K_L | 34.59 | 3.3016 +/- 0.0168 | 16.63% | | IQ4_XS | 35.30 | 3.0310 +/- 0.0149 | 7.07% | | IQ4_NL | 37.30 | 3.0261 +/- 0.0149 | 6.90% | | Q4_K_S | 37.58 | 3.0050 +/- 0.0148 | 6.15% | | Q4_K_M | 39.60 | 2.9674 +/- 0.0146 | 4.83% | | Q5_K_S | 45.32 | 2.8843 +/- 0.0141 | 1.89% | | Q5_K_M | 46.52 | 2.8656 +/- 0.0139 | 1.23% | | Q6_K | 53.91 | 2.8441 +/- 0.0138 | 0.47% | | Q8_0 | 69.83 | 2.8316 +/- 0.0138 | 0.03% | | F16 | 131.43 | 2.8308 +/- 0.0138 | 0.00% | Where to send questions or comments about the model Instructions on how to provide feedback or comments on the model can be found in the model [README](https://github.com/meta-llama/llama3). For more technical information about generation parameters and recipes for how to use Llama 3 in applications, please go [here](https://github.com/meta-llama/llama-recipes). ## License See the License file for Meta Llama 3 [here](https://llama.meta.com/llama3/license/) and Acceptable Use Policy [here](https://llama.meta.com/llama3/use-policy/)