tsavage68's picture
End of training
8e5cba3 verified
---
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
- trl
- dpo
- generated_from_trainer
model-index:
- name: mistralit2_1000_STEPS_1e5_05_beta_DPO
results: []
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# mistralit2_1000_STEPS_1e5_05_beta_DPO
This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 5.5986
- Rewards/chosen: -33.3448
- Rewards/rejected: -30.3545
- Rewards/accuracies: 0.3363
- Rewards/margins: -2.9903
- Logps/rejected: -89.2815
- Logps/chosen: -90.0755
- Logits/rejected: -5.1087
- Logits/chosen: -5.1087
## Model description
More information needed
## Intended uses & limitations
More information needed
## Training and evaluation data
More information needed
## Training procedure
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1e-05
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 2
- total_train_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
### Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 4.4348 | 0.1 | 50 | 3.7220 | -2.6729 | -1.1384 | 0.3473 | -1.5345 | -30.8491 | -28.7316 | -2.8934 | -2.8934 |
| 4.416 | 0.2 | 100 | 5.6628 | -13.9947 | -11.4311 | 0.3495 | -2.5636 | -51.4346 | -51.3752 | -2.4257 | -2.4257 |
| 7.1807 | 0.29 | 150 | 6.1960 | -21.8487 | -19.7985 | 0.3912 | -2.0502 | -68.1695 | -67.0832 | -3.2442 | -3.2442 |
| 8.769 | 0.39 | 200 | 6.0561 | -21.3584 | -19.1294 | 0.3758 | -2.2290 | -66.8312 | -66.1026 | -4.4565 | -4.4565 |
| 5.5309 | 0.49 | 250 | 6.0913 | -20.7922 | -18.6223 | 0.3736 | -2.1699 | -65.8170 | -64.9702 | -4.1868 | -4.1868 |
| 6.2196 | 0.59 | 300 | 6.0358 | -20.9943 | -18.6957 | 0.3604 | -2.2986 | -65.9639 | -65.3744 | -4.7911 | -4.7911 |
| 7.3358 | 0.68 | 350 | 5.9206 | -20.4631 | -18.2147 | 0.3626 | -2.2484 | -65.0017 | -64.3120 | -4.5065 | -4.5064 |
| 5.2999 | 0.78 | 400 | 5.9833 | -20.5954 | -18.3858 | 0.3736 | -2.2096 | -65.3440 | -64.5766 | -4.7346 | -4.7346 |
| 5.6113 | 0.88 | 450 | 6.0483 | -21.5104 | -19.3027 | 0.3780 | -2.2077 | -67.1778 | -66.4067 | -4.7235 | -4.7235 |
| 8.3581 | 0.98 | 500 | 6.0375 | -21.5757 | -19.3093 | 0.3648 | -2.2663 | -67.1911 | -66.5372 | -4.8755 | -4.8755 |
| 5.1376 | 1.07 | 550 | 6.1562 | -22.4771 | -20.0938 | 0.3648 | -2.3834 | -68.7599 | -68.3401 | -5.0881 | -5.0881 |
| 4.99 | 1.17 | 600 | 6.2114 | -23.4624 | -20.9043 | 0.3516 | -2.5581 | -70.3811 | -70.3107 | -5.6293 | -5.6293 |
| 4.5013 | 1.27 | 650 | 6.0015 | -28.8725 | -26.2780 | 0.3451 | -2.5945 | -81.1283 | -81.1308 | -4.9022 | -4.9023 |
| 5.5617 | 1.37 | 700 | 5.9849 | -29.5633 | -26.7992 | 0.3341 | -2.7642 | -82.1707 | -82.5125 | -4.9606 | -4.9606 |
| 5.267 | 1.46 | 750 | 5.9932 | -23.0310 | -20.5708 | 0.3582 | -2.4603 | -69.7140 | -69.4479 | -5.3521 | -5.3521 |
| 4.6177 | 1.56 | 800 | 5.5949 | -24.3540 | -21.9341 | 0.3538 | -2.4199 | -72.4406 | -72.0939 | -5.2290 | -5.2290 |
| 4.9479 | 1.66 | 850 | 6.0029 | -35.5381 | -32.2485 | 0.3363 | -3.2896 | -93.0695 | -94.4621 | -5.1637 | -5.1638 |
| 4.4494 | 1.76 | 900 | 5.6465 | -33.6541 | -30.6022 | 0.3253 | -3.0518 | -89.7769 | -90.6940 | -5.2063 | -5.2063 |
| 4.0125 | 1.86 | 950 | 5.6068 | -33.2845 | -30.2969 | 0.3363 | -2.9877 | -89.1661 | -89.9549 | -5.1204 | -5.1205 |
| 5.5487 | 1.95 | 1000 | 5.5986 | -33.3448 | -30.3545 | 0.3363 | -2.9903 | -89.2815 | -90.0755 | -5.1087 | -5.1087 |
### Framework versions
- Transformers 4.38.2
- Pytorch 2.0.0+cu117
- Datasets 2.18.0
- Tokenizers 0.15.2