End of training

8e5cba3 verified 7 months ago

No virus

5.78 kB

	---
	license: apache-2.0
	base_model: mistralai/Mistral-7B-Instruct-v0.2
	tags:
	- trl
	- dpo
	- generated_from_trainer
	model-index:
	- name: mistralit2_1000_STEPS_1e5_05_beta_DPO
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistralit2_1000_STEPS_1e5_05_beta_DPO

	This model is a fine-tuned version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co./mistralai/Mistral-7B-Instruct-v0.2) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 5.5986
	- Rewards/chosen: -33.3448
	- Rewards/rejected: -30.3545
	- Rewards/accuracies: 0.3363
	- Rewards/margins: -2.9903
	- Logps/rejected: -89.2815
	- Logps/chosen: -90.0755
	- Logits/rejected: -5.1087
	- Logits/chosen: -5.1087

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 1e-05
	- train_batch_size: 4
	- eval_batch_size: 1
	- seed: 42
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 8
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 100
	- training_steps: 1000

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rewards/chosen \| Rewards/rejected \| Rewards/accuracies \| Rewards/margins \| Logps/rejected \| Logps/chosen \| Logits/rejected \| Logits/chosen \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------------:\|:----------------:\|:------------------:\|:---------------:\|:--------------:\|:------------:\|:---------------:\|:-------------:\|
	\| 4.4348 \| 0.1 \| 50 \| 3.7220 \| -2.6729 \| -1.1384 \| 0.3473 \| -1.5345 \| -30.8491 \| -28.7316 \| -2.8934 \| -2.8934 \|
	\| 4.416 \| 0.2 \| 100 \| 5.6628 \| -13.9947 \| -11.4311 \| 0.3495 \| -2.5636 \| -51.4346 \| -51.3752 \| -2.4257 \| -2.4257 \|
	\| 7.1807 \| 0.29 \| 150 \| 6.1960 \| -21.8487 \| -19.7985 \| 0.3912 \| -2.0502 \| -68.1695 \| -67.0832 \| -3.2442 \| -3.2442 \|
	\| 8.769 \| 0.39 \| 200 \| 6.0561 \| -21.3584 \| -19.1294 \| 0.3758 \| -2.2290 \| -66.8312 \| -66.1026 \| -4.4565 \| -4.4565 \|
	\| 5.5309 \| 0.49 \| 250 \| 6.0913 \| -20.7922 \| -18.6223 \| 0.3736 \| -2.1699 \| -65.8170 \| -64.9702 \| -4.1868 \| -4.1868 \|
	\| 6.2196 \| 0.59 \| 300 \| 6.0358 \| -20.9943 \| -18.6957 \| 0.3604 \| -2.2986 \| -65.9639 \| -65.3744 \| -4.7911 \| -4.7911 \|
	\| 7.3358 \| 0.68 \| 350 \| 5.9206 \| -20.4631 \| -18.2147 \| 0.3626 \| -2.2484 \| -65.0017 \| -64.3120 \| -4.5065 \| -4.5064 \|
	\| 5.2999 \| 0.78 \| 400 \| 5.9833 \| -20.5954 \| -18.3858 \| 0.3736 \| -2.2096 \| -65.3440 \| -64.5766 \| -4.7346 \| -4.7346 \|
	\| 5.6113 \| 0.88 \| 450 \| 6.0483 \| -21.5104 \| -19.3027 \| 0.3780 \| -2.2077 \| -67.1778 \| -66.4067 \| -4.7235 \| -4.7235 \|
	\| 8.3581 \| 0.98 \| 500 \| 6.0375 \| -21.5757 \| -19.3093 \| 0.3648 \| -2.2663 \| -67.1911 \| -66.5372 \| -4.8755 \| -4.8755 \|
	\| 5.1376 \| 1.07 \| 550 \| 6.1562 \| -22.4771 \| -20.0938 \| 0.3648 \| -2.3834 \| -68.7599 \| -68.3401 \| -5.0881 \| -5.0881 \|
	\| 4.99 \| 1.17 \| 600 \| 6.2114 \| -23.4624 \| -20.9043 \| 0.3516 \| -2.5581 \| -70.3811 \| -70.3107 \| -5.6293 \| -5.6293 \|
	\| 4.5013 \| 1.27 \| 650 \| 6.0015 \| -28.8725 \| -26.2780 \| 0.3451 \| -2.5945 \| -81.1283 \| -81.1308 \| -4.9022 \| -4.9023 \|
	\| 5.5617 \| 1.37 \| 700 \| 5.9849 \| -29.5633 \| -26.7992 \| 0.3341 \| -2.7642 \| -82.1707 \| -82.5125 \| -4.9606 \| -4.9606 \|
	\| 5.267 \| 1.46 \| 750 \| 5.9932 \| -23.0310 \| -20.5708 \| 0.3582 \| -2.4603 \| -69.7140 \| -69.4479 \| -5.3521 \| -5.3521 \|
	\| 4.6177 \| 1.56 \| 800 \| 5.5949 \| -24.3540 \| -21.9341 \| 0.3538 \| -2.4199 \| -72.4406 \| -72.0939 \| -5.2290 \| -5.2290 \|
	\| 4.9479 \| 1.66 \| 850 \| 6.0029 \| -35.5381 \| -32.2485 \| 0.3363 \| -3.2896 \| -93.0695 \| -94.4621 \| -5.1637 \| -5.1638 \|
	\| 4.4494 \| 1.76 \| 900 \| 5.6465 \| -33.6541 \| -30.6022 \| 0.3253 \| -3.0518 \| -89.7769 \| -90.6940 \| -5.2063 \| -5.2063 \|
	\| 4.0125 \| 1.86 \| 950 \| 5.6068 \| -33.2845 \| -30.2969 \| 0.3363 \| -2.9877 \| -89.1661 \| -89.9549 \| -5.1204 \| -5.1205 \|
	\| 5.5487 \| 1.95 \| 1000 \| 5.5986 \| -33.3448 \| -30.3545 \| 0.3363 \| -2.9903 \| -89.2815 \| -90.0755 \| -5.1087 \| -5.1087 \|


	### Framework versions

	- Transformers 4.38.2
	- Pytorch 2.0.0+cu117
	- Datasets 2.18.0
	- Tokenizers 0.15.2