antalvdb commited on
Commit
2912c8b
1 Parent(s): 19633b4

Upload 40 files

Browse files
README.md CHANGED
@@ -9,28 +9,38 @@ model-index:
9
 
10
  # bart-base-spelling-nl
11
 
12
- This model is a Dutch fine-tuned version of [facebook/bart-base](https://huggingface.co/facebook/bart-base).
 
13
 
14
  It achieves the following results on the evaluation set:
15
- - Loss: 0.0276
 
16
  - Cer: 0.0147
17
 
18
  ## Model description
19
 
20
- This is a text-to-text fine-tuned version of [facebook/bart-base](https://huggingface.co/facebook/bart-base) trained on spelling correction. It leans on the excellent work by Oliver Guhr ([github](https://github.com/oliverguhr/spelling), [huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training was performed on an AWS EC2 instance (g5.xlarge) on a single GPU in about 4 hours.
 
 
 
 
 
21
 
22
  ## Intended uses & limitations
23
 
24
- The intended use for this model is to be a component of the [Valkuil.net](https://valkuil.net) context-sensitive spelling checker. A next version of the model will be trained on more data.
 
 
25
 
26
  ## Training and evaluation data
27
 
28
- The model was trained on a Dutch dataset composed of 300,000 lines of text from three public Dutch sources, downloaded from the [Opus corpus](https://opus.nlpl.eu/):
29
-
30
- - nl-europarlv7.100k.txt
31
- - nl-opensubtitles2016.100k.txt
32
- - nl-wikipedia.100k.txt
33
 
 
 
 
34
 
35
  ## Training procedure
36
 
@@ -51,24 +61,99 @@ The following hyperparameters were used during training:
51
 
52
  | Training Loss | Epoch | Step | Validation Loss | Cer |
53
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
54
- | 0.1617 | 0.11 | 1000 | 0.0986 | 0.9241 |
55
- | 0.1326 | 0.21 | 2000 | 0.0676 | 0.9240 |
56
- | 0.09 | 0.32 | 3000 | 0.0586 | 0.9241 |
57
- | 0.0891 | 0.43 | 4000 | 0.0530 | 0.9240 |
58
- | 0.0753 | 0.54 | 5000 | 0.0491 | 0.9239 |
59
- | 0.069 | 0.64 | 6000 | 0.0459 | 0.9238 |
60
- | 0.0615 | 0.75 | 7000 | 0.0435 | 0.9238 |
61
- | 0.0494 | 0.86 | 8000 | 0.0409 | 0.9237 |
62
- | 0.0671 | 0.97 | 9000 | 0.0388 | 0.9238 |
63
- | 0.0425 | 1.07 | 10000 | 0.0367 | 0.9237 |
64
- | 0.0394 | 1.18 | 11000 | 0.0356 | 0.9237 |
65
- | 0.0399 | 1.29 | 12000 | 0.0344 | 0.9236 |
66
- | 0.0375 | 1.4 | 13000 | 0.0333 | 0.9235 |
67
- | 0.0409 | 1.5 | 14000 | 0.0315 | 0.9237 |
68
- | 0.0291 | 1.61 | 15000 | 0.0304 | 0.9236 |
69
- | 0.0268 | 1.72 | 16000 | 0.0293 | 0.9236 |
70
- | 0.0309 | 1.83 | 17000 | 0.0284 | 0.9235 |
71
- | 0.0362 | 1.93 | 18000 | 0.0276 | 0.9235 |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
72
 
73
 
74
  ### Framework versions
 
9
 
10
  # bart-base-spelling-nl
11
 
12
+ This model is a Dutch fine-tuned version of
13
+ [facebook/bart-base](https://huggingface.co/facebook/bart-base).
14
 
15
  It achieves the following results on the evaluation set:
16
+
17
+ - Loss: 0.0217
18
  - Cer: 0.0147
19
 
20
  ## Model description
21
 
22
+ This is a text-to-text fine-tuned version of
23
+ [facebook/bart-base](https://huggingface.co/facebook/bart-base)
24
+ trained on spelling correction. It leans on the excellent work by
25
+ Oliver Guhr ([github](https://github.com/oliverguhr/spelling),
26
+ [huggingface](https://huggingface.co/oliverguhr/spelling-correction-english-base)). Training
27
+ was performed on an AWS EC2 instance (g5.xlarge) on a single GPU.
28
 
29
  ## Intended uses & limitations
30
 
31
+ The intended use for this model is to be a component of the
32
+ [Valkuil.net](https://valkuil.net) context-sensitive spelling
33
+ checker. A next version of the model will be trained on more data.
34
 
35
  ## Training and evaluation data
36
 
37
+ The model was trained on a Dutch dataset composed of 1,500,000 lines of
38
+ text from three public Dutch sources, downloaded from the [Opus
39
+ corpus](https://opus.nlpl.eu/):
 
 
40
 
41
+ - nl-europarlv7.100k.txt (500,000 lines)
42
+ - nl-opensubtitles2016.100k.txt (500,000 lines)
43
+ - nl-wikipedia.100k.txt (500,000 lines)
44
 
45
  ## Training procedure
46
 
 
61
 
62
  | Training Loss | Epoch | Step | Validation Loss | Cer |
63
  |:-------------:|:-----:|:-----:|:---------------:|:------:|
64
+ | 0.2546 | 0.02 | 1000 | 0.1801 | 0.9245 |
65
+ | 0.1646 | 0.04 | 2000 | 0.1203 | 0.9243 |
66
+ | 0.1456 | 0.06 | 3000 | 0.1016 | 0.9242 |
67
+ | 0.1204 | 0.09 | 4000 | 0.0849 | 0.9242 |
68
+ | 0.1226 | 0.11 | 5000 | 0.0736 | 0.9241 |
69
+ | 0.1049 | 0.13 | 6000 | 0.0680 | 0.9240 |
70
+ | 0.1071 | 0.15 | 7000 | 0.0671 | 0.9241 |
71
+ | 0.1038 | 0.17 | 8000 | 0.0615 | 0.9240 |
72
+ | 0.0815 | 0.19 | 9000 | 0.0575 | 0.9240 |
73
+ | 0.0828 | 0.21 | 10000 | 0.0572 | 0.9241 |
74
+ | 0.0851 | 0.24 | 11000 | 0.0533 | 0.9241 |
75
+ | 0.0787 | 0.26 | 12000 | 0.0529 | 0.9241 |
76
+ | 0.0795 | 0.28 | 13000 | 0.0518 | 0.9239 |
77
+ | 0.0864 | 0.3 | 14000 | 0.0492 | 0.9239 |
78
+ | 0.0806 | 0.32 | 15000 | 0.0471 | 0.9239 |
79
+ | 0.0808 | 0.34 | 16000 | 0.0483 | 0.9238 |
80
+ | 0.071 | 0.36 | 17000 | 0.0469 | 0.9239 |
81
+ | 0.0661 | 0.38 | 18000 | 0.0446 | 0.9239 |
82
+ | 0.0641 | 0.41 | 19000 | 0.0437 | 0.9239 |
83
+ | 0.0686 | 0.43 | 20000 | 0.0428 | 0.9238 |
84
+ | 0.0597 | 0.45 | 21000 | 0.0431 | 0.9238 |
85
+ | 0.0585 | 0.47 | 22000 | 0.0417 | 0.9238 |
86
+ | 0.0675 | 0.49 | 23000 | 0.0406 | 0.9238 |
87
+ | 0.0678 | 0.51 | 24000 | 0.0395 | 0.9238 |
88
+ | 0.0581 | 0.53 | 25000 | 0.0393 | 0.9238 |
89
+ | 0.0569 | 0.56 | 26000 | 0.0371 | 0.9239 |
90
+ | 0.0632 | 0.58 | 27000 | 0.0378 | 0.9238 |
91
+ | 0.0589 | 0.6 | 28000 | 0.0377 | 0.9238 |
92
+ | 0.0511 | 0.62 | 29000 | 0.0366 | 0.9237 |
93
+ | 0.0651 | 0.64 | 30000 | 0.0358 | 0.9239 |
94
+ | 0.0594 | 0.66 | 31000 | 0.0356 | 0.9238 |
95
+ | 0.054 | 0.68 | 32000 | 0.0368 | 0.9238 |
96
+ | 0.0498 | 0.71 | 33000 | 0.0353 | 0.9238 |
97
+ | 0.0559 | 0.73 | 34000 | 0.0337 | 0.9238 |
98
+ | 0.0502 | 0.75 | 35000 | 0.0341 | 0.9238 |
99
+ | 0.0588 | 0.77 | 36000 | 0.0339 | 0.9239 |
100
+ | 0.0487 | 0.79 | 37000 | 0.0338 | 0.9237 |
101
+ | 0.0489 | 0.81 | 38000 | 0.0333 | 0.9236 |
102
+ | 0.0493 | 0.83 | 39000 | 0.0331 | 0.9237 |
103
+ | 0.0481 | 0.85 | 40000 | 0.0323 | 0.9237 |
104
+ | 0.0444 | 0.88 | 41000 | 0.0318 | 0.9237 |
105
+ | 0.0446 | 0.9 | 42000 | 0.0311 | 0.9238 |
106
+ | 0.0469 | 0.92 | 43000 | 0.0311 | 0.9237 |
107
+ | 0.0525 | 0.94 | 44000 | 0.0312 | 0.9237 |
108
+ | 0.042 | 0.96 | 45000 | 0.0312 | 0.9236 |
109
+ | 0.0541 | 0.98 | 46000 | 0.0304 | 0.9237 |
110
+ | 0.0417 | 1.0 | 47000 | 0.0293 | 0.9238 |
111
+ | 0.0369 | 1.03 | 48000 | 0.0305 | 0.9237 |
112
+ | 0.0357 | 1.05 | 49000 | 0.0297 | 0.9237 |
113
+ | 0.0394 | 1.07 | 50000 | 0.0296 | 0.9237 |
114
+ | 0.0343 | 1.09 | 51000 | 0.0288 | 0.9237 |
115
+ | 0.037 | 1.11 | 52000 | 0.0286 | 0.9237 |
116
+ | 0.0367 | 1.13 | 53000 | 0.0281 | 0.9237 |
117
+ | 0.0336 | 1.15 | 54000 | 0.0287 | 0.9236 |
118
+ | 0.0331 | 1.18 | 55000 | 0.0275 | 0.9237 |
119
+ | 0.0381 | 1.2 | 56000 | 0.0276 | 0.9237 |
120
+ | 0.0438 | 1.22 | 57000 | 0.0269 | 0.9237 |
121
+ | 0.0319 | 1.24 | 58000 | 0.0274 | 0.9236 |
122
+ | 0.0364 | 1.26 | 59000 | 0.0265 | 0.9237 |
123
+ | 0.0402 | 1.28 | 60000 | 0.0262 | 0.9237 |
124
+ | 0.0341 | 1.3 | 61000 | 0.0259 | 0.9237 |
125
+ | 0.0346 | 1.32 | 62000 | 0.0258 | 0.9237 |
126
+ | 0.0378 | 1.35 | 63000 | 0.0258 | 0.9236 |
127
+ | 0.0372 | 1.37 | 64000 | 0.0253 | 0.9237 |
128
+ | 0.0375 | 1.39 | 65000 | 0.0248 | 0.9237 |
129
+ | 0.0336 | 1.41 | 66000 | 0.0246 | 0.9236 |
130
+ | 0.031 | 1.43 | 67000 | 0.0246 | 0.9237 |
131
+ | 0.0344 | 1.45 | 68000 | 0.0248 | 0.9236 |
132
+ | 0.0307 | 1.47 | 69000 | 0.0244 | 0.9236 |
133
+ | 0.0293 | 1.5 | 70000 | 0.0239 | 0.9237 |
134
+ | 0.0406 | 1.52 | 71000 | 0.0235 | 0.9236 |
135
+ | 0.0273 | 1.54 | 72000 | 0.0235 | 0.9236 |
136
+ | 0.0316 | 1.56 | 73000 | 0.0234 | 0.9235 |
137
+ | 0.0308 | 1.58 | 74000 | 0.0229 | 0.9236 |
138
+ | 0.0291 | 1.6 | 75000 | 0.0229 | 0.9236 |
139
+ | 0.0325 | 1.62 | 76000 | 0.0229 | 0.9236 |
140
+ | 0.0347 | 1.65 | 77000 | 0.0224 | 0.9237 |
141
+ | 0.0268 | 1.67 | 78000 | 0.0226 | 0.9237 |
142
+ | 0.0279 | 1.69 | 79000 | 0.0219 | 0.9236 |
143
+ | 0.0247 | 1.71 | 80000 | 0.0220 | 0.9235 |
144
+ | 0.0259 | 1.73 | 81000 | 0.0215 | 0.9236 |
145
+ | 0.0294 | 1.75 | 82000 | 0.0217 | 0.9235 |
146
+ | 0.0267 | 1.77 | 83000 | 0.0217 | 0.9236 |
147
+ | 0.0273 | 1.79 | 84000 | 0.0213 | 0.9236 |
148
+ | 0.0242 | 1.82 | 85000 | 0.0213 | 0.9236 |
149
+ | 0.0254 | 1.84 | 86000 | 0.0210 | 0.9236 |
150
+ | 0.0273 | 1.86 | 87000 | 0.0209 | 0.9236 |
151
+ | 0.0261 | 1.88 | 88000 | 0.0210 | 0.9235 |
152
+ | 0.0244 | 1.9 | 89000 | 0.0206 | 0.9235 |
153
+ | 0.0256 | 1.92 | 90000 | 0.0206 | 0.9235 |
154
+ | 0.0283 | 1.94 | 91000 | 0.0205 | 0.9235 |
155
+ | 0.0255 | 1.97 | 92000 | 0.0204 | 0.9235 |
156
+ | 0.022 | 1.99 | 93000 | 0.0203 | 0.9235 |
157
 
158
 
159
  ### Framework versions
all_results.json CHANGED
@@ -1,14 +1,14 @@
1
  {
2
  "epoch": 2.0,
3
- "eval_cer": 0.014650276279943154,
4
- "eval_loss": 0.02764066681265831,
5
- "eval_runtime": 1893.5756,
6
- "eval_samples": 1998,
7
- "eval_samples_per_second": 1.055,
8
- "eval_steps_per_second": 0.264,
9
- "train_loss": 0.07157236041639212,
10
- "train_runtime": 20459.2275,
11
- "train_samples": 297945,
12
- "train_samples_per_second": 29.126,
13
- "train_steps_per_second": 0.91
14
  }
 
1
  {
2
  "epoch": 2.0,
3
+ "eval_cer": 0.014659309693522676,
4
+ "eval_loss": 0.02169678919017315,
5
+ "eval_runtime": 1986.5865,
6
+ "eval_samples": 2000,
7
+ "eval_samples_per_second": 1.007,
8
+ "eval_steps_per_second": 0.252,
9
+ "train_loss": 0.056187623144469706,
10
+ "train_runtime": 103918.6177,
11
+ "train_samples": 1497617,
12
+ "train_samples_per_second": 28.823,
13
+ "train_steps_per_second": 0.901
14
  }
checkpoint-82000/config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/bart-base",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "add_bias_logits": false,
6
+ "add_final_layer_norm": false,
7
+ "architectures": [
8
+ "BartForConditionalGeneration"
9
+ ],
10
+ "attention_dropout": 0.1,
11
+ "bos_token_id": 0,
12
+ "classif_dropout": 0.1,
13
+ "classifier_dropout": 0.0,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 12,
16
+ "decoder_ffn_dim": 3072,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 6,
19
+ "decoder_start_token_id": 2,
20
+ "dropout": 0.1,
21
+ "early_stopping": true,
22
+ "encoder_attention_heads": 12,
23
+ "encoder_ffn_dim": 3072,
24
+ "encoder_layerdrop": 0.0,
25
+ "encoder_layers": 6,
26
+ "eos_token_id": 2,
27
+ "forced_bos_token_id": 0,
28
+ "forced_eos_token_id": 2,
29
+ "gradient_checkpointing": false,
30
+ "id2label": {
31
+ "0": "LABEL_0",
32
+ "1": "LABEL_1",
33
+ "2": "LABEL_2"
34
+ },
35
+ "init_std": 0.02,
36
+ "is_encoder_decoder": true,
37
+ "label2id": {
38
+ "LABEL_0": 0,
39
+ "LABEL_1": 1,
40
+ "LABEL_2": 2
41
+ },
42
+ "max_position_embeddings": 1024,
43
+ "model_type": "bart",
44
+ "no_repeat_ngram_size": 3,
45
+ "normalize_before": false,
46
+ "normalize_embedding": true,
47
+ "num_beams": 4,
48
+ "num_hidden_layers": 6,
49
+ "pad_token_id": 1,
50
+ "scale_embedding": false,
51
+ "task_specific_params": {
52
+ "summarization": {
53
+ "length_penalty": 1.0,
54
+ "max_length": 128,
55
+ "min_length": 12,
56
+ "num_beams": 4
57
+ },
58
+ "summarization_cnn": {
59
+ "length_penalty": 2.0,
60
+ "max_length": 142,
61
+ "min_length": 56,
62
+ "num_beams": 4
63
+ },
64
+ "summarization_xsum": {
65
+ "length_penalty": 1.0,
66
+ "max_length": 62,
67
+ "min_length": 11,
68
+ "num_beams": 6
69
+ }
70
+ },
71
+ "torch_dtype": "float32",
72
+ "transformers_version": "4.27.3",
73
+ "use_cache": true,
74
+ "vocab_size": 50265
75
+ }
checkpoint-82000/generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 0,
3
+ "decoder_start_token_id": 2,
4
+ "early_stopping": true,
5
+ "eos_token_id": 2,
6
+ "forced_bos_token_id": 0,
7
+ "forced_eos_token_id": 2,
8
+ "no_repeat_ngram_size": 3,
9
+ "num_beams": 4,
10
+ "pad_token_id": 1,
11
+ "transformers_version": "4.27.3"
12
+ }
checkpoint-82000/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-82000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a17d88093e87e8df0ff5f3e7559d515043395f93bc941a9e93067faee3618b3
3
+ size 1115515845
checkpoint-82000/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:427ae75ad33d4a0d44dbd33ed8368148f0c0ed8b6366f6fd4ab811b1a25ab783
3
+ size 557971229
checkpoint-82000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6a01dd88118397bc75ab8c812ee7e28032d0d487e03cf0e766aa7893e6e7929
3
+ size 14511
checkpoint-82000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1a9e9da03fe2ab551482336c6a68c260e028a256d73563c50c618db94187b071
3
+ size 627
checkpoint-82000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-82000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-82000/tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "cls_token": "<s>",
5
+ "eos_token": "</s>",
6
+ "errors": "replace",
7
+ "mask_token": "<mask>",
8
+ "model_max_length": 1024,
9
+ "pad_token": "<pad>",
10
+ "sep_token": "</s>",
11
+ "special_tokens_map_file": null,
12
+ "tokenizer_class": "BartTokenizer",
13
+ "trim_offsets": true,
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-82000/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-82000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d6ba773586049a2d1ccba9567f6c8e5b0d83b854a413d7f1ad1c48cba750cc8
3
+ size 3707
checkpoint-82000/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-93000/config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "facebook/bart-base",
3
+ "activation_dropout": 0.1,
4
+ "activation_function": "gelu",
5
+ "add_bias_logits": false,
6
+ "add_final_layer_norm": false,
7
+ "architectures": [
8
+ "BartForConditionalGeneration"
9
+ ],
10
+ "attention_dropout": 0.1,
11
+ "bos_token_id": 0,
12
+ "classif_dropout": 0.1,
13
+ "classifier_dropout": 0.0,
14
+ "d_model": 768,
15
+ "decoder_attention_heads": 12,
16
+ "decoder_ffn_dim": 3072,
17
+ "decoder_layerdrop": 0.0,
18
+ "decoder_layers": 6,
19
+ "decoder_start_token_id": 2,
20
+ "dropout": 0.1,
21
+ "early_stopping": true,
22
+ "encoder_attention_heads": 12,
23
+ "encoder_ffn_dim": 3072,
24
+ "encoder_layerdrop": 0.0,
25
+ "encoder_layers": 6,
26
+ "eos_token_id": 2,
27
+ "forced_bos_token_id": 0,
28
+ "forced_eos_token_id": 2,
29
+ "gradient_checkpointing": false,
30
+ "id2label": {
31
+ "0": "LABEL_0",
32
+ "1": "LABEL_1",
33
+ "2": "LABEL_2"
34
+ },
35
+ "init_std": 0.02,
36
+ "is_encoder_decoder": true,
37
+ "label2id": {
38
+ "LABEL_0": 0,
39
+ "LABEL_1": 1,
40
+ "LABEL_2": 2
41
+ },
42
+ "max_position_embeddings": 1024,
43
+ "model_type": "bart",
44
+ "no_repeat_ngram_size": 3,
45
+ "normalize_before": false,
46
+ "normalize_embedding": true,
47
+ "num_beams": 4,
48
+ "num_hidden_layers": 6,
49
+ "pad_token_id": 1,
50
+ "scale_embedding": false,
51
+ "task_specific_params": {
52
+ "summarization": {
53
+ "length_penalty": 1.0,
54
+ "max_length": 128,
55
+ "min_length": 12,
56
+ "num_beams": 4
57
+ },
58
+ "summarization_cnn": {
59
+ "length_penalty": 2.0,
60
+ "max_length": 142,
61
+ "min_length": 56,
62
+ "num_beams": 4
63
+ },
64
+ "summarization_xsum": {
65
+ "length_penalty": 1.0,
66
+ "max_length": 62,
67
+ "min_length": 11,
68
+ "num_beams": 6
69
+ }
70
+ },
71
+ "torch_dtype": "float32",
72
+ "transformers_version": "4.27.3",
73
+ "use_cache": true,
74
+ "vocab_size": 50265
75
+ }
checkpoint-93000/generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token_id": 0,
3
+ "decoder_start_token_id": 2,
4
+ "early_stopping": true,
5
+ "eos_token_id": 2,
6
+ "forced_bos_token_id": 0,
7
+ "forced_eos_token_id": 2,
8
+ "no_repeat_ngram_size": 3,
9
+ "num_beams": 4,
10
+ "pad_token_id": 1,
11
+ "transformers_version": "4.27.3"
12
+ }
checkpoint-93000/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-93000/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b50ad2733a051bc14cd7e16494888b38f19f8a9c9a56b1850b2e36507ccf9e0
3
+ size 1115515845
checkpoint-93000/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:161b33e875a701144c482652fcd9bf42bd95d8f8c62264f149e6592d589d0372
3
+ size 557971229
checkpoint-93000/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:68eae258137b074f016a23b3f048fc516226660e5fef5976eaf3abd8c0554780
3
+ size 14575
checkpoint-93000/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6bccca33aedde9bac44f8596dcca6d8fddca315e60a7ba5d9ec28fbb46f1feb3
3
+ size 627
checkpoint-93000/special_tokens_map.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "cls_token": "<s>",
4
+ "eos_token": "</s>",
5
+ "mask_token": {
6
+ "content": "<mask>",
7
+ "lstrip": true,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false
11
+ },
12
+ "pad_token": "<pad>",
13
+ "sep_token": "</s>",
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-93000/tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-93000/tokenizer_config.json ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "bos_token": "<s>",
4
+ "cls_token": "<s>",
5
+ "eos_token": "</s>",
6
+ "errors": "replace",
7
+ "mask_token": "<mask>",
8
+ "model_max_length": 1024,
9
+ "pad_token": "<pad>",
10
+ "sep_token": "</s>",
11
+ "special_tokens_map_file": null,
12
+ "tokenizer_class": "BartTokenizer",
13
+ "trim_offsets": true,
14
+ "unk_token": "<unk>"
15
+ }
checkpoint-93000/trainer_state.json ADDED
The diff for this file is too large to render. See raw diff
 
checkpoint-93000/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d6ba773586049a2d1ccba9567f6c8e5b0d83b854a413d7f1ad1c48cba750cc8
3
+ size 3707
checkpoint-93000/vocab.json ADDED
The diff for this file is too large to render. See raw diff
 
eval_results.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "epoch": 2.0,
3
- "eval_cer": 0.014650276279943154,
4
- "eval_loss": 0.02764066681265831,
5
- "eval_runtime": 1893.5756,
6
- "eval_samples": 1998,
7
- "eval_samples_per_second": 1.055,
8
- "eval_steps_per_second": 0.264
9
  }
 
1
  {
2
  "epoch": 2.0,
3
+ "eval_cer": 0.014659309693522676,
4
+ "eval_loss": 0.02169678919017315,
5
+ "eval_runtime": 1986.5865,
6
+ "eval_samples": 2000,
7
+ "eval_samples_per_second": 1.007,
8
+ "eval_steps_per_second": 0.252
9
  }
pytorch_model.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:80ec838026fb8064635d2b995bbdaee79908374ba00bf32c931e3f2c58ae7740
3
  size 557971229
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:427ae75ad33d4a0d44dbd33ed8368148f0c0ed8b6366f6fd4ab811b1a25ab783
3
  size 557971229
tokenizer.json CHANGED
@@ -1,6 +1,11 @@
1
  {
2
  "version": "1.0",
3
- "truncation": null,
 
 
 
 
 
4
  "padding": null,
5
  "added_tokens": [
6
  {
 
1
  {
2
  "version": "1.0",
3
+ "truncation": {
4
+ "direction": "Right",
5
+ "max_length": 1024,
6
+ "strategy": "LongestFirst",
7
+ "stride": 0
8
+ },
9
  "padding": null,
10
  "added_tokens": [
11
  {
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
  "epoch": 2.0,
3
- "train_loss": 0.07157236041639212,
4
- "train_runtime": 20459.2275,
5
- "train_samples": 297945,
6
- "train_samples_per_second": 29.126,
7
- "train_steps_per_second": 0.91
8
  }
 
1
  {
2
  "epoch": 2.0,
3
+ "train_loss": 0.056187623144469706,
4
+ "train_runtime": 103918.6177,
5
+ "train_samples": 1497617,
6
+ "train_samples_per_second": 28.823,
7
+ "train_steps_per_second": 0.901
8
  }
trainer_state.json CHANGED
The diff for this file is too large to render. See raw diff
 
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:57b1e317a1539f18fba060b518401be911171d875d5c3512b6969c88a588025d
3
  size 3707
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2d6ba773586049a2d1ccba9567f6c8e5b0d83b854a413d7f1ad1c48cba750cc8
3
  size 3707