TheBloke
/

OpenAssistant-SFT-7-Llama-30B-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Apr 29, 2023

Commit

6b3d216

•

1 Parent(s): 2462d72

Update README.md

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -33,7 +33,8 @@ Two sets of models are provided:
 * Groupsize = 1024
   * Should work reliably in 24GB VRAM
 * Groupsize = 128
-  * May require more than 24GB VRAM, depending on response length
   * In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
 For each model, two versions are available:

 * Groupsize = 1024
   * Should work reliably in 24GB VRAM
 * Groupsize = 128
+  * Optimal setting for highest inference quality
+  * But may require more than 24GB VRAM, depending on response length
   * In my testing it ran out of VRAM on a 24GB card around 1500 tokens returned.
 For each model, two versions are available: