RichardErkhov commited on
Commit
7338423
1 Parent(s): 9fd7449

uploaded readme

Browse files
Files changed (1) hide show
  1. README.md +120 -0
README.md ADDED
@@ -0,0 +1,120 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Quantization made by Richard Erkhov.
2
+
3
+ [Github](https://github.com/RichardErkhov)
4
+
5
+ [Discord](https://discord.gg/pvy7H8DZMG)
6
+
7
+ [Request more models](https://github.com/RichardErkhov/quant_request)
8
+
9
+
10
+ mega-ar-126m-4k - bnb 8bits
11
+ - Model creator: https://huggingface.co/BEE-spoke-data/
12
+ - Original model: https://huggingface.co/BEE-spoke-data/mega-ar-126m-4k/
13
+
14
+
15
+
16
+
17
+ Original model description:
18
+ ---
19
+ license: apache-2.0
20
+ datasets:
21
+ - JeanKaddour/minipile
22
+ - BEE-spoke-data/wikipedia-20230901.en-deduped
23
+ - BEE-spoke-data/knowledge-inoc-concat-v1
24
+ language:
25
+ - en
26
+ inference:
27
+ parameters:
28
+ max_new_tokens: 64
29
+ do_sample: true
30
+ temperature: 0.8
31
+ repetition_penalty: 1.05
32
+ no_repeat_ngram_size: 4
33
+ epsilon_cutoff: 0.0006
34
+ renormalize_logits: true
35
+ widget:
36
+ - text: My name is El Microondas the Wise, and
37
+ example_title: El Microondas
38
+ - text: Kennesaw State University is a public
39
+ example_title: Kennesaw State University
40
+ - text: >-
41
+ Bungie Studios is an American video game developer. They are most famous
42
+ for developing the award winning Halo series of video games. They also
43
+ made Destiny. The studio was founded
44
+ example_title: Bungie
45
+ - text: The Mona Lisa is a world-renowned painting created by
46
+ example_title: Mona Lisa
47
+ - text: >-
48
+ The Harry Potter series, written by J.K. Rowling, begins with the book
49
+ titled
50
+ example_title: Harry Potter Series
51
+ - text: >-
52
+ Question: I have cities, but no houses. I have mountains, but no trees. I
53
+ have water, but no fish. What am I?
54
+
55
+ Answer:
56
+ example_title: Riddle
57
+ - text: The process of photosynthesis involves the conversion of
58
+ example_title: Photosynthesis
59
+ - text: >-
60
+ Jane went to the store to buy some groceries. She picked up apples,
61
+ oranges, and a loaf of bread. When she got home, she realized she forgot
62
+ example_title: Story Continuation
63
+ - text: >-
64
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
65
+ and another train leaves Station B at 10:00 AM and travels at 80 mph, when
66
+ will they meet if the distance between the stations is 300 miles?
67
+
68
+ To determine
69
+ example_title: Math Problem
70
+ - text: In the context of computer programming, an algorithm is
71
+ example_title: Algorithm Definition
72
+ pipeline_tag: text-generation
73
+ ---
74
+
75
+
76
+ # BEE-spoke-data/mega-ar-126m-4k
77
+
78
+
79
+ This may not be the _best_ language model, but it is a language model! It's interesting for several reasons, not the least of which is that it's not technically a transformer.
80
+
81
+ Details:
82
+
83
+ - 768 hidden size, 12 layers
84
+ - no MEGA chunking, 4096 context length
85
+ - EMA dimension 16, shared dimension 192
86
+ - tokenizer: GPT NeoX
87
+ - train-from-scratch
88
+
89
+
90
+ For more info on MEGA (_& what some of the params above mean_), check out the [model docs](https://huggingface.co/docs/transformers/main/en/model_doc/mega#mega) or the [original paper](https://arxiv.org/abs/2209.10655)
91
+
92
+ ## Usage
93
+
94
+ Usage is the same as any other small textgen model.
95
+
96
+ Given the model's small size and architecture, it's probably best to leverage its longer context by adding input context to "see more" rather than "generate more".
97
+
98
+ ## evals
99
+
100
+ Initial data:
101
+
102
+ `hf-causal-experimental (pretrained=BEE-spoke-data/mega-ar-126m-4k,revision=main,trust_remote_code=True,dtype='float'), limit: None, provide_description: False, num_fewshot: 0, batch_size: 4`
103
+
104
+ | Task |Version| Metric | Value | |Stderr|
105
+ |--------------|------:|--------|------:|---|-----:|
106
+ |arc_easy | 0|acc | 0.4415|± |0.0102|
107
+ | | |acc_norm| 0.3969|± |0.0100|
108
+ |boolq | 1|acc | 0.5749|± |0.0086|
109
+ |lambada_openai| 0|ppl |94.9912|± |3.9682|
110
+ | | |acc | 0.2408|± |0.0060|
111
+ |openbookqa | 0|acc | 0.1660|± |0.0167|
112
+ | | |acc_norm| 0.2780|± |0.0201|
113
+ |piqa | 0|acc | 0.5974|± |0.0114|
114
+ | | |acc_norm| 0.5914|± |0.0115|
115
+ |winogrande | 0|acc | 0.4830|± |0.0140|
116
+
117
+
118
+
119
+ ---
120
+