Squish42's picture
Initial commit
be6f120
|
raw
history blame
959 Bytes
metadata
license: unknown

ehartford/WizardLM-7B-Uncensored quantized to 8bit GPTQ with group size 128 + true sequential, no act order.

For most uses this probably isn't what you want.
For 4bit GPTQ quantizations see TheBloke/WizardLM-7B-uncensored-GPTQ

Quantized using AutoGPTQ with the following config:

config: dict = dict(
    quantize_config=dict(model_file_base_name='WizardLM-7B-Uncensored',
                         bits=8, desc_act=False, group_size=128, true_sequential=True),
    use_safetensors=True
)

See quantize.py for the full script.

Tested for compatibility with:

  • WSL with GPTQ-for-Llama triton branch.

AutoGPTQ loader should read configuration from quantize_config.json.
For GPTQ-for-Llama use the following configuration when loading:
wbits: 8
groupsize: 128
model_type: llama