kenhktsui
/

nano-phi-192M-v0.1

+---
+language:
+- en
+license: mit
+library_name: transformers
+inference:
+  parameters:
+    max_new_tokens: 64
+    do_sample: true
+    temperature: 0.1
+    repetition_penalty: 10
+    no_repeat_ngram_size: 4
+    eta_cutoff: 0.0006
+    renormalize_logits: true
+widget:
+- text: My name is El Microondas the Wise, and
+  example_title: El Microondas
+- text: Kennesaw State University is a public
+  example_title: Kennesaw State University
+- text: >-
+    Bungie Studios is an American video game developer. They are most famous for
+    developing the award winning Halo series of video games. They also made
+    Destiny. The studio was founded
+  example_title: Bungie
+- text: The Mona Lisa is a world-renowned painting created by
+  example_title: Mona Lisa
+- text: >-
+    The Harry Potter series, written by J.K. Rowling, begins with the book
+    titled
+  example_title: Harry Potter Series
+- text: >-
+    Question: I have cities, but no houses. I have mountains, but no trees. I
+    have water, but no fish. What am I?
+    Answer:
+  example_title: Riddle
+- text: The process of photosynthesis involves the conversion of
+  example_title: Photosynthesis
+- text: >-
+    Jane went to the store to buy some groceries. She picked up apples, oranges,
+    and a loaf of bread. When she got home, she realized she forgot
+  example_title: Story Continuation
+- text: >-
+    Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
+    another train leaves Station B at 10:00 AM and travels at 80 mph, when will
+    they meet if the distance between the stations is 300 miles?
+    To determine
+  example_title: Math Problem
+- text: In the context of computer programming, an algorithm is
+  example_title: Algorithm Definition
+pipeline_tag: text-generation
+model-index:
+- name: nano-phi-115M-v0.1
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: AI2 Reasoning Challenge (25-Shot)
+      type: ai2_arc
+      config: ARC-Challenge
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: acc_norm
+      value: 24.15
+      name: normalized accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HellaSwag (10-Shot)
+      type: hellaswag
+      split: validation
+      args:
+        num_few_shot: 10
+    metrics:
+    - type: acc_norm
+      value: 29.99
+      name: normalized accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: MMLU (5-Shot)
+      type: cais/mmlu
+      config: all
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 25.46
+      name: accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: TruthfulQA (0-shot)
+      type: truthful_qa
+      config: multiple_choice
+      split: validation
+      args:
+        num_few_shot: 0
+    metrics:
+    - type: mc2
+      value: 44.3
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Winogrande (5-shot)
+      type: winogrande
+      config: winogrande_xl
+      split: validation
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 51.45
+      name: accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: GSM8k (5-shot)
+      type: gsm8k
+      config: main
+      split: test
+      args:
+        num_few_shot: 5
+    metrics:
+    - type: acc
+      value: 0
+      name: accuracy
+    source:
+      url: >-
+        https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=kenhktsui/nano-phi-115M-v0.1
+      name: Open LLM Leaderboard
+datasets:
+- kenhktsui/minipile_quality_score_v1
+- kenhktsui/simple_wikipedia_LM_quality_score_v1
+- kenhktsui/refinedweb-3m_quality_score_v1
+- kenhktsui/TM-DATA_quality_score_v1
+- kenhktsui/openwebtext_quality_score_v1
+- HuggingFaceTB/cosmopedia
+---
+# Model Card for nano-phi-192M-v0.1
+This is a continual effort from [kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1).
+The model is not aligned.
+Major differences:
+- bigger tokenizer's vocab size
+- addition of [HuggingFaceTB/cosmopedia](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia) as training dataset
+- training token: 19B vs 7B
+## How to use
+To use the model, you will need transformer version >= 4.37.2
+```
+pip install transformers>=4.37.2
+```
+```
+# Use a pipeline as a high-level helper
+from transformers import pipeline
+pipe = pipeline("text-generation", model="kenhktsui/nano-phi-192M-v0.1")
+pipe("I am a machine learning researcher. I work on", max_new_tokens=50, repetition_penalty=10.0)
+```
+## Some metrics
+- model
+  - hidden_size: 768
+  - num_key_value_heads: 8 (grouped query attention)
+  - num_attention_heads: 24
+  - num_hidden_layers: 6
+  - context length: 1024
+  - total params: 192M
+- training:
+  - global steps: 36,000
+## [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
+| Metric               |kenhktsui/nano-phi-191M-v0.1  |[kenhktsui/nano-phi-115M-v0.1](https://huggingface.co/kenhktsui/nano-phi-115M-v0.1)|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2) (Reproduced)|
+|-----------------------|---------------------------|---------------------------|---------------------------|
+| Avg.                 |29.24   | 28.68  |61.53 |
+| ARC (25-shot)        |24.15   | 21.93  |61.52 |
+| HellaSwag (10-shot)  | 29.99  | 27.87  |75.13 |
+| MMLU (5-shot)        |25.46   | 25.30  |58.23 |
+| TruthfulQA (0-shot)  |44.30   | 46.01  |44.46 |
+| Winogrande (5-shot)  |51.54   | 50.99  |74.51 |
+| GSM8K (5-shot)       |0.0     |  0.0   |55.34  |
+Details:
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
+|  Task  |Version| Metric |Value |   |Stderr|
+|--------|------:|--------|-----:|---|-----:|
+|arc_easy|      0|acc     |0.4596|±  |0.0102|
+|        |       |acc_norm|0.4070|±  |0.0101|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 25, batch_size: 8
+|    Task     |Version| Metric |Value |   |Stderr|
+|-------------|------:|--------|-----:|---|-----:|
+|arc_challenge|      0|acc     |0.1911|±  |0.0115|
+|             |       |acc_norm|0.2415|±  |0.0125|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 10, batch_size: 8
+|  Task   |Version| Metric |Value |   |Stderr|
+|---------|------:|--------|-----:|---|-----:|
+|hellaswag|      0|acc     |0.2833|±  |0.0045|
+|         |       |acc_norm|0.2999|±  |0.0046|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 0, batch_size: 8
+|    Task     |Version|Metric|Value |   |Stderr|
+|-------------|------:|------|-----:|---|-----:|
+|truthfulqa_mc|      1|mc1   |0.2583|±  |0.0153|
+|             |       |mc2   |0.4430|±  |0.0152|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
+|                      Task                       |Version| Metric |Value |   |Stderr|
+|-------------------------------------------------|------:|--------|-----:|---|-----:|
+|hendrycksTest-abstract_algebra                   |      1|acc     |0.2200|±  |0.0416|
+|                                                 |       |acc_norm|0.2200|±  |0.0416|
+|hendrycksTest-anatomy                            |      1|acc     |0.2593|±  |0.0379|
+|                                                 |       |acc_norm|0.2593|±  |0.0379|
+|hendrycksTest-astronomy                          |      1|acc     |0.1711|±  |0.0306|
+|                                                 |       |acc_norm|0.1711|±  |0.0306|
+|hendrycksTest-business_ethics                    |      1|acc     |0.2400|±  |0.0429|
+|                                                 |       |acc_norm|0.2400|±  |0.0429|
+|hendrycksTest-clinical_knowledge                 |      1|acc     |0.2566|±  |0.0269|
+|                                                 |       |acc_norm|0.2566|±  |0.0269|
+|hendrycksTest-college_biology                    |      1|acc     |0.2639|±  |0.0369|
+|                                                 |       |acc_norm|0.2639|±  |0.0369|
+|hendrycksTest-college_chemistry                  |      1|acc     |0.1800|±  |0.0386|
+|                                                 |       |acc_norm|0.1800|±  |0.0386|
+|hendrycksTest-college_computer_science           |      1|acc     |0.3300|±  |0.0473|
+|                                                 |       |acc_norm|0.3300|±  |0.0473|
+|hendrycksTest-college_mathematics                |      1|acc     |0.3000|±  |0.0461|
+|                                                 |       |acc_norm|0.3000|±  |0.0461|
+|hendrycksTest-college_medicine                   |      1|acc     |0.2023|±  |0.0306|
+|                                                 |       |acc_norm|0.2023|±  |0.0306|
+|hendrycksTest-college_physics                    |      1|acc     |0.2843|±  |0.0449|
+|                                                 |       |acc_norm|0.2843|±  |0.0449|
+|hendrycksTest-computer_security                  |      1|acc     |0.2200|±  |0.0416|
+|                                                 |       |acc_norm|0.2200|±  |0.0416|
+|hendrycksTest-conceptual_physics                 |      1|acc     |0.2511|±  |0.0283|
+|                                                 |       |acc_norm|0.2511|±  |0.0283|
+|hendrycksTest-econometrics                       |      1|acc     |0.2807|±  |0.0423|
+|                                                 |       |acc_norm|0.2807|±  |0.0423|
+|hendrycksTest-electrical_engineering             |      1|acc     |0.2897|±  |0.0378|
+|                                                 |       |acc_norm|0.2897|±  |0.0378|
+|hendrycksTest-elementary_mathematics             |      1|acc     |0.2804|±  |0.0231|
+|                                                 |       |acc_norm|0.2804|±  |0.0231|
+|hendrycksTest-formal_logic                       |      1|acc     |0.2143|±  |0.0367|
+|                                                 |       |acc_norm|0.2143|±  |0.0367|
+|hendrycksTest-global_facts                       |      1|acc     |0.1700|±  |0.0378|
+|                                                 |       |acc_norm|0.1700|±  |0.0378|
+|hendrycksTest-high_school_biology                |      1|acc     |0.3226|±  |0.0266|
+|                                                 |       |acc_norm|0.3226|±  |0.0266|
+|hendrycksTest-high_school_chemistry              |      1|acc     |0.2759|±  |0.0314|
+|                                                 |       |acc_norm|0.2759|±  |0.0314|
+|hendrycksTest-high_school_computer_science       |      1|acc     |0.2700|±  |0.0446|
+|                                                 |       |acc_norm|0.2700|±  |0.0446|
+|hendrycksTest-high_school_european_history       |      1|acc     |0.2606|±  |0.0343|
+|                                                 |       |acc_norm|0.2606|±  |0.0343|
+|hendrycksTest-high_school_geography              |      1|acc     |0.3081|±  |0.0329|
+|                                                 |       |acc_norm|0.3081|±  |0.0329|
+|hendrycksTest-high_school_government_and_politics|      1|acc     |0.3627|±  |0.0347|
+|                                                 |       |acc_norm|0.3627|±  |0.0347|
+|hendrycksTest-high_school_macroeconomics         |      1|acc     |0.2641|±  |0.0224|
+|                                                 |       |acc_norm|0.2641|±  |0.0224|
+|hendrycksTest-high_school_mathematics            |      1|acc     |0.2630|±  |0.0268|
+|                                                 |       |acc_norm|0.2630|±  |0.0268|
+|hendrycksTest-high_school_microeconomics         |      1|acc     |0.3403|±  |0.0308|
+|                                                 |       |acc_norm|0.3403|±  |0.0308|
+|hendrycksTest-high_school_physics                |      1|acc     |0.3113|±  |0.0378|
+|                                                 |       |acc_norm|0.3113|±  |0.0378|
+|hendrycksTest-high_school_psychology             |      1|acc     |0.2716|±  |0.0191|
+|                                                 |       |acc_norm|0.2716|±  |0.0191|
+|hendrycksTest-high_school_statistics             |      1|acc     |0.4491|±  |0.0339|
+|                                                 |       |acc_norm|0.4491|±  |0.0339|
+|hendrycksTest-high_school_us_history             |      1|acc     |0.2402|±  |0.0300|
+|                                                 |       |acc_norm|0.2402|±  |0.0300|
+|hendrycksTest-high_school_world_history          |      1|acc     |0.2363|±  |0.0277|
+|                                                 |       |acc_norm|0.2363|±  |0.0277|
+|hendrycksTest-human_aging                        |      1|acc     |0.2197|±  |0.0278|
+|                                                 |       |acc_norm|0.2197|±  |0.0278|
+|hendrycksTest-human_sexuality                    |      1|acc     |0.2824|±  |0.0395|
+|                                                 |       |acc_norm|0.2824|±  |0.0395|
+|hendrycksTest-international_law                  |      1|acc     |0.2479|±  |0.0394|
+|                                                 |       |acc_norm|0.2479|±  |0.0394|
+|hendrycksTest-jurisprudence                      |      1|acc     |0.2037|±  |0.0389|
+|                                                 |       |acc_norm|0.2037|±  |0.0389|
+|hendrycksTest-logical_fallacies                  |      1|acc     |0.2393|±  |0.0335|
+|                                                 |       |acc_norm|0.2393|±  |0.0335|
+|hendrycksTest-machine_learning                   |      1|acc     |0.1875|±  |0.0370|
+|                                                 |       |acc_norm|0.1875|±  |0.0370|
+|hendrycksTest-management                         |      1|acc     |0.2039|±  |0.0399|
+|                                                 |       |acc_norm|0.2039|±  |0.0399|
+|hendrycksTest-marketing                          |      1|acc     |0.1795|±  |0.0251|
+|                                                 |       |acc_norm|0.1795|±  |0.0251|
+|hendrycksTest-medical_genetics                   |      1|acc     |0.3000|±  |0.0461|
+|                                                 |       |acc_norm|0.3000|±  |0.0461|
+|hendrycksTest-miscellaneous                      |      1|acc     |0.2644|±  |0.0158|
+|                                                 |       |acc_norm|0.2644|±  |0.0158|
+|hendrycksTest-moral_disputes                     |      1|acc     |0.2225|±  |0.0224|
+|                                                 |       |acc_norm|0.2225|±  |0.0224|
+|hendrycksTest-moral_scenarios                    |      1|acc     |0.2726|±  |0.0149|
+|                                                 |       |acc_norm|0.2726|±  |0.0149|
+|hendrycksTest-nutrition                          |      1|acc     |0.2353|±  |0.0243|
+|                                                 |       |acc_norm|0.2353|±  |0.0243|
+|hendrycksTest-philosophy                         |      1|acc     |0.2283|±  |0.0238|
+|                                                 |       |acc_norm|0.2283|±  |0.0238|
+|hendrycksTest-prehistory                         |      1|acc     |0.2099|±  |0.0227|
+|                                                 |       |acc_norm|0.2099|±  |0.0227|
+|hendrycksTest-professional_accounting            |      1|acc     |0.2411|±  |0.0255|
+|                                                 |       |acc_norm|0.2411|±  |0.0255|
+|hendrycksTest-professional_law                   |      1|acc     |0.2458|±  |0.0110|
+|                                                 |       |acc_norm|0.2458|±  |0.0110|
+|hendrycksTest-professional_medicine              |      1|acc     |0.3897|±  |0.0296|
+|                                                 |       |acc_norm|0.3897|±  |0.0296|
+|hendrycksTest-professional_psychology            |      1|acc     |0.2141|±  |0.0166|
+|                                                 |       |acc_norm|0.2141|±  |0.0166|
+|hendrycksTest-public_relations                   |      1|acc     |0.1818|±  |0.0369|
+|                                                 |       |acc_norm|0.1818|±  |0.0369|
+|hendrycksTest-security_studies                   |      1|acc     |0.2490|±  |0.0277|
+|                                                 |       |acc_norm|0.2490|±  |0.0277|
+|hendrycksTest-sociology                          |      1|acc     |0.2537|±  |0.0308|
+|                                                 |       |acc_norm|0.2537|±  |0.0308|
+|hendrycksTest-us_foreign_policy                  |      1|acc     |0.2900|±  |0.0456|
+|                                                 |       |acc_norm|0.2900|±  |0.0456|
+|hendrycksTest-virology                           |      1|acc     |0.1807|±  |0.0300|
+|                                                 |       |acc_norm|0.1807|±  |0.0300|
+|hendrycksTest-world_religions                    |      1|acc     |0.1813|±  |0.0295|
+|                                                 |       |acc_norm|0.1813|±  |0.0295|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
+|   Task   |Version|Metric|Value |   |Stderr|
+|----------|------:|------|-----:|---|-----:|
+|winogrande|      0|acc   |0.5154|±  | 0.014|
+hf-causal-experimental (pretrained=/content/lm-evaluation-harness/artifacts/model-9gh18vfl:v25,use_accelerate=false,trust_remote_code=True), limit: None, provide_description: False, num_fewshot: 5, batch_size: 8
+|Task |Version|Metric|Value|   |Stderr|
+|-----|------:|------|----:|---|-----:|
+|gsm8k|      0|acc   |    0|±  |     0|