qwp4w3hyb commited on
Commit
eae6272
1 Parent(s): 50d3f7a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -1,5 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
 
 
2
 
3
- # Quant infos
4
 
5
- - needs [#8604](https://github.com/ggerganov/llama.cpp/pull/8604) & latest master with tekken tokenizer fixes applied
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-generation
6
+ tags:
7
+ - mistral
8
+ - instruct
9
+ - gguf
10
+ - imatrix
11
+ base_model: mistralai/Mistral-Nemo-Instruct-2407
12
+ ---
13
 
14
+ # Quant Infos
15
 
16
+ ## Updated for all recent llama.cpp fixes (final logit soft capping+sliding window+tokenizer)
17
 
18
+ - needs [#8604](https://github.com/ggerganov/llama.cpp/pull/8604) & latest master with tekken tokenizer fixes applied
19
+
20
+ - quants done with an importance matrix for improved quantization loss
21
+ - Requantized ggufs & imatrix from hf bf16
22
+ - Wide coverage of different gguf quant types from Q\_8\_0 down to IQ1\_S
23
+ - experimental custom quant types
24
+ - `_L` with `--output-tensor-type f16 --token-embedding-type f16` (same as bartowski's)
25
+ - Imatrix generated with [this](https://gist.github.com/bartowski1182/eb213dccb3571f863da82e99418f81e8) multi-purpose dataset by [bartowski](https://huggingface.co/bartowski).
26
+ ```
27
+ ./imatrix -m $model_name-bf16.gguf -f calibration_datav3.txt -o $model_name.imatrix
28
+ ```
29
+
30
+ # Original Model Card:
31
+
32
+ TODO