tsavage68's picture
End of training
8e5cba3 verified
metadata
license: apache-2.0
base_model: mistralai/Mistral-7B-Instruct-v0.2
tags:
  - trl
  - dpo
  - generated_from_trainer
model-index:
  - name: mistralit2_1000_STEPS_1e5_05_beta_DPO
    results: []

mistralit2_1000_STEPS_1e5_05_beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 5.5986
  • Rewards/chosen: -33.3448
  • Rewards/rejected: -30.3545
  • Rewards/accuracies: 0.3363
  • Rewards/margins: -2.9903
  • Logps/rejected: -89.2815
  • Logps/chosen: -90.0755
  • Logits/rejected: -5.1087
  • Logits/chosen: -5.1087

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 8
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
4.4348 0.1 50 3.7220 -2.6729 -1.1384 0.3473 -1.5345 -30.8491 -28.7316 -2.8934 -2.8934
4.416 0.2 100 5.6628 -13.9947 -11.4311 0.3495 -2.5636 -51.4346 -51.3752 -2.4257 -2.4257
7.1807 0.29 150 6.1960 -21.8487 -19.7985 0.3912 -2.0502 -68.1695 -67.0832 -3.2442 -3.2442
8.769 0.39 200 6.0561 -21.3584 -19.1294 0.3758 -2.2290 -66.8312 -66.1026 -4.4565 -4.4565
5.5309 0.49 250 6.0913 -20.7922 -18.6223 0.3736 -2.1699 -65.8170 -64.9702 -4.1868 -4.1868
6.2196 0.59 300 6.0358 -20.9943 -18.6957 0.3604 -2.2986 -65.9639 -65.3744 -4.7911 -4.7911
7.3358 0.68 350 5.9206 -20.4631 -18.2147 0.3626 -2.2484 -65.0017 -64.3120 -4.5065 -4.5064
5.2999 0.78 400 5.9833 -20.5954 -18.3858 0.3736 -2.2096 -65.3440 -64.5766 -4.7346 -4.7346
5.6113 0.88 450 6.0483 -21.5104 -19.3027 0.3780 -2.2077 -67.1778 -66.4067 -4.7235 -4.7235
8.3581 0.98 500 6.0375 -21.5757 -19.3093 0.3648 -2.2663 -67.1911 -66.5372 -4.8755 -4.8755
5.1376 1.07 550 6.1562 -22.4771 -20.0938 0.3648 -2.3834 -68.7599 -68.3401 -5.0881 -5.0881
4.99 1.17 600 6.2114 -23.4624 -20.9043 0.3516 -2.5581 -70.3811 -70.3107 -5.6293 -5.6293
4.5013 1.27 650 6.0015 -28.8725 -26.2780 0.3451 -2.5945 -81.1283 -81.1308 -4.9022 -4.9023
5.5617 1.37 700 5.9849 -29.5633 -26.7992 0.3341 -2.7642 -82.1707 -82.5125 -4.9606 -4.9606
5.267 1.46 750 5.9932 -23.0310 -20.5708 0.3582 -2.4603 -69.7140 -69.4479 -5.3521 -5.3521
4.6177 1.56 800 5.5949 -24.3540 -21.9341 0.3538 -2.4199 -72.4406 -72.0939 -5.2290 -5.2290
4.9479 1.66 850 6.0029 -35.5381 -32.2485 0.3363 -3.2896 -93.0695 -94.4621 -5.1637 -5.1638
4.4494 1.76 900 5.6465 -33.6541 -30.6022 0.3253 -3.0518 -89.7769 -90.6940 -5.2063 -5.2063
4.0125 1.86 950 5.6068 -33.2845 -30.2969 0.3363 -2.9877 -89.1661 -89.9549 -5.1204 -5.1205
5.5487 1.95 1000 5.5986 -33.3448 -30.3545 0.3363 -2.9903 -89.2815 -90.0755 -5.1087 -5.1087

Framework versions

  • Transformers 4.38.2
  • Pytorch 2.0.0+cu117
  • Datasets 2.18.0
  • Tokenizers 0.15.2