TheBloke commited on
Commit
39878e0
1 Parent(s): 644be67

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -10
README.md CHANGED
@@ -30,16 +30,19 @@ This model requires the following prompt template:
30
 
31
  ## CHOICE OF MODELS
32
 
33
- Two sets of models are provided:
34
 
35
- * Groupsize = 1024
36
  * Should work reliably in 24GB VRAM
 
 
 
 
37
  * Groupsize = 128
38
  * Optimal setting for highest inference quality
39
- * But may require more than 24GB VRAM, depending on response length
40
- * In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
41
 
42
- For each model, two versions are available:
43
  * `compat.no-act-order.safetensor`
44
  * Works with all versions of GPTQ-for-LLaMa, including the version in text-generation-webui one-click-installers
45
  * `latest.act-order.safetensors`
@@ -50,10 +53,13 @@ For each model, two versions are available:
50
 
51
  I have used branches to separate the models. This means you can clone the branch you want and not got model files you don't need.
52
 
53
- * Branch: **main** = groupsize 1024, `compat.no-act-order.safetensor` file
54
- * Branch: **1024-latest** = groupsize 1024, `latest.no-act-order.safetensor` file
55
- * Branch: **128-compat** = groupsize 128, `compat.no-act-order.safetensor` file
56
- * Branch: **128-latest** = groupsize 128, `latest.no-act-order.safetensor` file
 
 
 
57
 
58
  ![branches](https://i.imgur.com/PdiHnLxm.png)
59
 
@@ -68,7 +74,7 @@ Open the text-generation-webui UI as normal.
68
  5. Click the **Refresh** icon next to **Model** in the top left.
69
  6. In the **Model drop-down**: choose the model you just downloaded, `OpenAssistant-SFT-7-Llama-30B-GPTQ`.
70
  7. If you see an error in the bottom right, ignore it - it's temporary.
71
- 8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = 1024`, `model_type = Llama`
72
  9. Click **Save settings for this model** in the top right.
73
  10. Click **Reload the Model** in the top right.
74
  11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 
30
 
31
  ## CHOICE OF MODELS
32
 
33
+ Three sets of models are provided:
34
 
35
+ * Groupsize = None
36
  * Should work reliably in 24GB VRAM
37
+ * Uses --act-order for the best possible inference quality given its lack of group_size.
38
+ * Groupsize = 1024
39
+ * Theoretically higher inference accuracy
40
+ * May OOM on long context lengths in 24GB VRAM
41
  * Groupsize = 128
42
  * Optimal setting for highest inference quality
43
+ * Will definitely need more than 24GB VRAM on longer context lengths (1000-1500+ tokens returned)
 
44
 
45
+ For the 128g and 1024g models, two versions are available:
46
  * `compat.no-act-order.safetensor`
47
  * Works with all versions of GPTQ-for-LLaMa, including the version in text-generation-webui one-click-installers
48
  * `latest.act-order.safetensors`
 
53
 
54
  I have used branches to separate the models. This means you can clone the branch you want and not got model files you don't need.
55
 
56
+ If you have 24GB VRAM you are strongly recommended to use the file in `main`, with group_size = None. This is fully compatible, and won't OOM.
57
+
58
+ * Branch: **main** = groupsize None, `OpenAssistant-SFT-7-Llama-30B-GPTQ-4bit.safetensors` file
59
+ * Branch: **1024-compat** = groupsize 1024, `compat.no-act-order.safetensors` file
60
+ * Branch: **1024-latest** = groupsize 1024, `latest.act-order.safetensors` file
61
+ * Branch: **128-compat** = groupsize 128, `compat.no-act-order.safetensors` file
62
+ * Branch: **128-latest** = groupsize 128, `latest.act-order.safetensors` file
63
 
64
  ![branches](https://i.imgur.com/PdiHnLxm.png)
65
 
 
74
  5. Click the **Refresh** icon next to **Model** in the top left.
75
  6. In the **Model drop-down**: choose the model you just downloaded, `OpenAssistant-SFT-7-Llama-30B-GPTQ`.
76
  7. If you see an error in the bottom right, ignore it - it's temporary.
77
+ 8. Fill out the `GPTQ parameters` on the right: `Bits = 4`, `Groupsize = None`, `model_type = Llama`
78
  9. Click **Save settings for this model** in the top right.
79
  10. Click **Reload the Model** in the top right.
80
  11. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!