trollek commited on
Commit
dc4ebd8
1 Parent(s): e8f0ad8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -3
README.md CHANGED
@@ -1,3 +1,95 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: h2oai/h2o-danube2-1.8b-base
4
+ datasets:
5
+ - migtissera/Tess-v1.5
6
+ language:
7
+ - en
8
+ library_name: transformers
9
+ tags:
10
+ - llama-factory
11
+ - unsloth
12
+ ---
13
+ # h2o-danube2 with ChatML template
14
+
15
+ This model was first fine-tuned with [BAdam](https://arxiv.org/abs/2404.02827 "BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models") on [migtissera/Tess-v1.5](https://huggingface.co/datasets/migtissera/Tess-v1.5) using LLama-Factory.
16
+
17
+ ## Template
18
+
19
+ ```jinja
20
+ <|im_start|>system
21
+ {{system}}<|im_end|>
22
+ <|im_start|>user
23
+ {{instruction}}<|im_end|>
24
+ <|im_start|>assistant
25
+ {{response}}<|im_end|>
26
+ ```
27
+
28
+ ## BAdam config
29
+
30
+ ```yaml
31
+ ### model
32
+ model_name_or_path: danube2-base-chatml
33
+
34
+ ### method
35
+ stage: sft
36
+ do_train: true
37
+ finetuning_type: full
38
+ use_badam: true
39
+ badam_switch_mode: ascending
40
+ badam_switch_interval: 50
41
+ badam_verbose: 1
42
+ badam_start_block: 6
43
+ seed: 720
44
+
45
+ ### dataset
46
+ dataset: tess15
47
+ template: hermes_chatml
48
+ cutoff_len: 8192
49
+ overwrite_cache: false
50
+ preprocessing_num_workers: 12
51
+
52
+ ### output
53
+ output_dir: tess15-chatml-badam
54
+ logging_steps: 5
55
+ save_steps: 1
56
+ save_strategy: epoch
57
+ plot_loss: true
58
+ overwrite_output_dir: false
59
+
60
+ ### train
61
+ per_device_train_batch_size: 2
62
+ gradient_accumulation_steps: 4
63
+ learning_rate: 0.00001
64
+ num_train_epochs: 1
65
+ lr_scheduler_type: constant_with_warmup
66
+ warmup_ratio: 0.01
67
+ bf16: true
68
+ flash_attn: fa2
69
+
70
+ ### eval
71
+ val_size: 0.01
72
+ per_device_eval_batch_size: 1
73
+ eval_strategy: steps
74
+ eval_steps: 1000
75
+ ```
76
+
77
+ ### BAdam training results
78
+
79
+ | Training Loss | Epoch | Step | Validation Loss |
80
+ |:-------------:|:------:|:-----:|:---------------:|
81
+ | 0.8017 | 0.0643 | 1000 | 0.6820 |
82
+ | 0.6167 | 0.1287 | 2000 | 0.6610 |
83
+ | 0.6161 | 0.1930 | 3000 | 0.6496 |
84
+ | 0.6322 | 0.2574 | 4000 | 0.6423 |
85
+ | 0.5127 | 0.3217 | 5000 | 0.6366 |
86
+ | 0.61 | 0.3860 | 6000 | 0.6312 |
87
+ | 0.6758 | 0.4504 | 7000 | 0.6266 |
88
+ | 0.5901 | 0.5147 | 8000 | 0.6215 |
89
+ | 0.5163 | 0.5791 | 9000 | 0.6197 |
90
+ | 0.6043 | 0.6434 | 10000 | 0.6175 |
91
+ | 0.5056 | 0.7077 | 11000 | 0.6153 |
92
+ | 0.5772 | 0.7721 | 12000 | 0.6126 |
93
+ | 0.6692 | 0.8364 | 13000 | 0.6107 |
94
+ | 0.5262 | 0.9008 | 14000 | 0.6066 |
95
+ | 0.6386 | 0.9651 | 15000 | 0.6056 |