teleprint-me/cyberpunk-valerie-v0.1

Valerie v0.1 Model Card

Overview

Valerie v0.1 is a custom language model created using llama.cpp (commit: 532c173) with a context length of 256 tokens, embedding length of 256, 8 heads, and 16 layers. This model was pretrained on a dataset consisting of female V's dialog from Cyberpunk 2077, extracted using the Voice Over Subtitle Map mod.

Model Information

Full sampling

Model name	Adam iteration	Model filename	Vocabulary size
Valerie v0.1 Checkpoint	1750	chk-valerie-v0.1-256x32-1750.gguf	32,000
Valerie v0.1 Model	1750	ggml-valerie-v0.1-256x32-f32-1750.gguf	32,000

The ggml-valerie-v0.1-256x32-f32-1750.gguf release represents a single epoch of all 51443 samples, completing over 1700 iterations over the entire dataset, and took approximately 3 hours for training.

Repeat sampling

Model name	Adam iteration	Model filename	Vocabulary size
Valerie v0.1 Checkpoint	3600	chk-valerie-v0.1-256x32-LATEST.gguf	32,000
Valerie v0.1 Model	3600	ggml-valerie-v0.1-256x32-f32-LATEST.gguf	32,000

The ggml-valerie-v0.1-256x32-f32-LATEST.gguf release represents two epochs of all 51443 samples, completing over 3600 iterations over the entire dataset, and took approximately 6 hours for training.

Files and versions

ggml-vocab-mistral.gguf: Extracted Mistral 7B model vocabulary.
ggml-valerie-v0.1-256x32-f32-1750.gguf: The pretrained model checkpoint version 1750.
ggml-valerie-v0.1-256x32-f32-LATEST.gguf: The latest pretrained model checkpoint. Currently 3600.

Settings

Vocabulary size: 32,000
Context length: 256 tokens
Embedding length: 256
Heads: 8
Layers: 16
Batch size: 32
Seed: 1
Saved checkpoint every 50 iterations

Usage

To use Valerie v0.1, follow these steps:

Clone the llama.cpp library

git clone https://github.com/ggerganov/llama.cpp

Reference the llama.cpp README.md for more information about building. You can build using raw CPU or even OpenBLAS. CUDA, ROCm, Vulkan, and other backends are also available.

Arch Linux Example:

# CPU build using BLAS backend on Arch Linux
sudo pacman -S openblas openblas64
make LLAMA_OPENBLAS=1

Download the latest model.

wget https://huggingface.co./teleprint-me/cyberpunk-valerie-v0.1/resolve/main/ggml-valerie-v0.1-256x32-f32-LATEST.gguf?download=true -O 
ggml-valerie-v0.1-256x32-f32-LATEST.gguf

This will download the latest available base model.

Perform inference with the latest model checkpoint using the provided command:

./main -m models/valerie/v0.1/ggml-valerie-v0.1-256x32-f32-LATEST.gguf --color -e -s 1 -c 4096

Benchmarks

Performance metrics for evaluating v0.1 iteration 3600 on CPU, BLAS, and Vulkan backends.

llama-bench

model	size	params	backend	threads	test	t/s
llama ?B all F32	114.53 MiB	30.02 M	CPU	8	pp 512	12781.37 ± 2258.61
llama ?B all F32	114.53 MiB	30.02 M	CPU	8	tg 128	410.74 ± 6.13
llama ?B all F32	114.53 MiB	30.02 M	BLAS	8	pp 512	233.53 ± 1.56
llama ?B all F32	114.53 MiB	30.02 M	BLAS	8	tg 128	391.63 ± 14.02
llama ?B all F32	114.53 MiB	30.02 M	Vulkan	99	pp 512	18779.40 ± 111.01
llama ?B all F32	114.53 MiB	30.02 M	Vulkan	99	tg 128	96.25 ± 0.46

build: ab0dee5 (2686)

batched-bench - CPU

PP	TG	B	N_KV	T_PP s	S_PP t/s	T_TG s	S_TG t/s	T s	S t/s
128	128	1	256	0.009	14365.88	0.345	370.86	0.354	723.06
128	128	2	512	0.022	11514.42	0.377	679.29	0.399	1282.90
128	128	4	1024	0.052	9811.44	0.438	1168.69	0.490	2088.60
128	128	8	2048	0.093	11067.40	0.745	1373.82	0.838	2444.24
128	256	1	384	0.011	11861.74	0.705	363.37	0.715	536.83
128	256	2	768	0.022	11649.60	0.768	666.97	0.790	972.62
128	256	4	1536	0.050	10252.10	0.912	1122.94	0.962	1596.95
256	128	1	384	0.021	12028.94	0.345	370.85	0.366	1047.94
256	128	2	768	0.049	10351.80	0.404	633.82	0.453	1694.02
256	128	4	1536	0.118	8688.72	0.484	1058.15	0.602	2552.70
256	256	1	512	0.022	11477.76	0.715	357.83	0.738	694.02
256	256	2	1024	0.050	10263.61	0.822	622.72	0.872	1174.20
256	256	4	2048	0.092	11089.45	0.990	1033.97	1.083	1891.58
512	128	1	640	0.050	10235.70	0.372	344.35	0.422	1517.52
512	128	2	1280	0.093	10987.83	0.445	575.12	0.538	2377.77
512	256	1	768	0.050	10208.56	0.783	326.97	0.833	921.85
512	256	2	1536	0.091	11216.51	0.925	553.26	1.017	1510.73

main: n_kv_max = 2048, n_batch = 2048, n_ubatch = 512, is_pp_shared = 0, n_gpu_layers = 999, n_threads = 8, n_threads_batch = 8

Citations

When using Valerie v0.1 in your research, please remember to cite the following:

aberrio. (2024). Valerie v0.1: A custom language model for female V's dialog from Cyberpunk 2077. https://huggingface.co./teleprint-me/cyberpunk-valerie-v0.1
GGML team. (2023). llama.cpp version 532c173. Georgi Gerganov Machine Learning Library. https://github.com/ggerganov/llama.cpp
MistralAI (2023). Extracted sentencepiece model vocabulary: https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2
julieisdead (2021). Voice Over Subtitle Map: Files that contain the IDs and content for Voice Over files. https://www.nexusmods.com/cyberpunk2077/mods/2045
CD Projekt RED (2020). Cyberpunk 2077: GTA is a close second. https://cyberpunk.net

Contributors

Austin (teleprint-me) - Created and trained Valerie v0.1 using llama.cpp and the referenced dataset.

Community

Join the community of fellow language model enthusiasts and researchers by sharing your knowledge, asking questions, and collaborating on projects related to creating custom models using llama.cpp.

License

Valerie v0.1 is released under the CC-BY-NC-SA-3.0 license. You are free to use, modify, and redistribute this model for non-commercial purposes, but you must provide attribution to the original authors and release any derived works under the same license.