|
--- |
|
pipeline_tag: text-generation |
|
tags: |
|
- openvino |
|
- mpt |
|
- sparse |
|
- quantization |
|
library_name: "OpenVINO" |
|
--- |
|
|
|
See the benchmark scripts in this repo. |
|
|
|
``` |
|
pip install deepsparse-nightly[llm]==1.6.0.20231120 |
|
pip install openvino==2023.3.0 |
|
``` |
|
|
|
## Benchmarking |
|
1. Clone this repo |
|
2. Concatenate the big fp32 IR model: |
|
```bash |
|
cd ./models/neuralmagic/mpt-7b-gsm8k-pt/fp32 |
|
cat openvino_model.bin.part-a* > openvino_model.bin |
|
``` |
|
3. Reproduce NM paper: `deepsparse_reproduce.bash` |
|
4. OV benchmarkapp: `benchmarkapp_*.bash` |
|
|
|
## Generating these IRs |
|
https://github.com/yujiepan-work/24h1-sparse-quantized-llm-ov |