--- license: apache-2.0 base_model: teknium/OpenHermes-2.5-Mistral-7B tags: - generated_from_trainer model-index: - name: openhermes-mistral-2.5-7b-dpo-test results: [] --- # openhermes-mistral-2.5-7b-dpo-test This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co./teknium/OpenHermes-2.5-Mistral-7B) on the None dataset. It achieves the following results on the evaluation set: - Loss: 0.4487 - Rewards/chosen: -0.2951 - Rewards/rejected: -2.2421 - Rewards/accuracies: 0.875 - Rewards/margins: 1.9470 - Logps/rejected: -257.4751 - Logps/chosen: -204.3027 - Logits/rejected: -3.0752 - Logits/chosen: -3.0485 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.0001 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 2 - training_steps: 200 ### Training results | Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen | |:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:| | 0.1645 | 0.01 | 10 | 0.5339 | 0.3993 | -0.1483 | 0.6875 | 0.5476 | -236.5374 | -197.3593 | -3.1575 | -3.1872 | | 0.0519 | 0.01 | 20 | 0.5521 | 0.2239 | -0.4486 | 0.625 | 0.6725 | -239.5405 | -199.1127 | -3.1969 | -3.2456 | | 0.1618 | 0.01 | 30 | 0.5866 | -0.0538 | -0.8893 | 0.5625 | 0.8355 | -243.9472 | -201.8902 | -3.2286 | -3.2525 | | 0.1752 | 0.02 | 40 | 0.5943 | -0.2184 | -1.2057 | 0.5 | 0.9873 | -247.1112 | -203.5360 | -3.2201 | -3.2477 | | 0.3811 | 0.03 | 50 | 0.6973 | -0.6180 | -1.8146 | 0.5 | 1.1966 | -253.2001 | -207.5316 | -3.1943 | -3.2034 | | 1.158 | 0.03 | 60 | 0.6347 | -0.4710 | -1.7363 | 0.5625 | 1.2653 | -252.4173 | -206.0622 | -3.1655 | -3.1197 | | 0.8751 | 0.04 | 70 | 0.6103 | -0.4061 | -1.5966 | 0.5625 | 1.1905 | -251.0201 | -205.4132 | -3.1360 | -3.0544 | | 0.7811 | 0.04 | 80 | 0.6405 | -0.4774 | -1.6574 | 0.5625 | 1.1799 | -251.6278 | -206.1260 | -3.1337 | -3.0492 | | 1.4305 | 0.04 | 90 | 0.6257 | -0.4784 | -1.6184 | 0.5625 | 1.1399 | -251.2379 | -206.1361 | -3.1251 | -3.0489 | | 0.5478 | 0.05 | 100 | 0.6191 | -0.5317 | -1.7067 | 0.5625 | 1.1750 | -252.1214 | -206.6691 | -3.1207 | -3.0753 | | 0.6344 | 0.06 | 110 | 0.5691 | -0.4827 | -1.7734 | 0.5625 | 1.2907 | -252.7882 | -206.1789 | -3.1075 | -3.0806 | | 0.5405 | 0.06 | 120 | 0.5337 | -0.4681 | -2.1739 | 0.8125 | 1.7058 | -256.7935 | -206.0332 | -3.1124 | -3.0733 | | 0.7848 | 0.07 | 130 | 0.5390 | -0.5288 | -2.3789 | 0.8125 | 1.8501 | -258.8436 | -206.6404 | -3.1019 | -3.0628 | | 1.3119 | 0.07 | 140 | 0.4753 | -0.3276 | -2.0907 | 0.875 | 1.7631 | -255.9614 | -204.6279 | -3.0904 | -3.0648 | | 0.3636 | 0.07 | 150 | 0.4555 | -0.2566 | -2.0064 | 0.625 | 1.7498 | -255.1179 | -203.9175 | -3.0804 | -3.0640 | | 0.427 | 0.08 | 160 | 0.4614 | -0.2900 | -2.0804 | 0.625 | 1.7904 | -255.8585 | -204.2518 | -3.0721 | -3.0518 | | 0.8971 | 0.09 | 170 | 0.4629 | -0.3117 | -2.1791 | 0.875 | 1.8673 | -256.8448 | -204.4694 | -3.0711 | -3.0468 | | 0.6219 | 0.09 | 180 | 0.4560 | -0.3042 | -2.2114 | 0.875 | 1.9073 | -257.1686 | -204.3934 | -3.0743 | -3.0485 | | 0.7551 | 0.1 | 190 | 0.4520 | -0.3007 | -2.2400 | 0.875 | 1.9392 | -257.4540 | -204.3593 | -3.0755 | -3.0481 | | 1.0917 | 0.1 | 200 | 0.4487 | -0.2951 | -2.2421 | 0.875 | 1.9470 | -257.4751 | -204.3027 | -3.0752 | -3.0485 | ### Framework versions - Transformers 4.34.1 - Pytorch 2.1.0+cu121 - Datasets 2.14.6 - Tokenizers 0.14.1