mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

This model is a fine-tuned version of mistralai/Mistral-7B-Instruct-v0.2 on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.8137
Rewards/chosen: -13.6944
Rewards/rejected: -12.2044
Rewards/accuracies: 0.3495
Rewards/margins: -1.4900
Logps/rejected: -69.2538
Logps/chosen: -69.0338
Logits/rejected: -5.2668
Logits/chosen: -5.2668

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
training_steps: 1000

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
4.2261	0.1	50	2.5021	-2.9115	-1.7672	0.3516	-1.1443	-34.4631	-33.0910	-3.0003	-3.0001
4.0353	0.2	100	4.3015	-16.2009	-14.9302	0.3912	-1.2707	-78.3397	-77.3888	-1.4166	-1.4166
4.1344	0.29	150	3.8834	-13.4989	-12.1622	0.3846	-1.3367	-69.1129	-68.3820	-3.1652	-3.1652
6.0597	0.39	200	3.8687	-13.7714	-12.6321	0.3956	-1.1392	-70.6795	-69.2904	-3.3126	-3.3126
3.4133	0.49	250	3.7600	-12.9111	-11.6593	0.3736	-1.2517	-67.4368	-66.4227	-3.5276	-3.5276
3.8331	0.59	300	3.7138	-12.6732	-11.3367	0.3582	-1.3365	-66.3615	-65.6299	-4.3713	-4.3713
4.4899	0.68	350	3.6843	-12.5529	-11.2259	0.3736	-1.3270	-65.9920	-65.2288	-4.2730	-4.2730
3.2404	0.78	400	3.6913	-12.6760	-11.3481	0.3692	-1.3279	-66.3993	-65.6391	-4.4066	-4.4066
3.4317	0.88	450	3.7402	-12.8394	-11.5008	0.3714	-1.3386	-66.9084	-66.1840	-4.7568	-4.7568
5.1385	0.98	500	3.7270	-12.8543	-11.4815	0.3582	-1.3728	-66.8442	-66.2336	-4.8716	-4.8716
3.1946	1.07	550	3.7911	-13.4302	-11.9891	0.3626	-1.4411	-68.5361	-68.1532	-5.0836	-5.0836
3.0812	1.17	600	3.9012	-14.2400	-12.6930	0.3538	-1.5470	-70.8825	-70.8524	-5.8038	-5.8038
3.1908	1.27	650	3.8805	-14.0486	-12.5350	0.3429	-1.5136	-70.3556	-70.2144	-5.0640	-5.0640
3.5745	1.37	700	3.8088	-13.5700	-12.0845	0.3429	-1.4855	-68.8541	-68.6191	-5.0789	-5.0789
3.3361	1.46	750	3.7803	-13.3782	-11.9205	0.3604	-1.4577	-68.3074	-67.9799	-5.1590	-5.1590
3.0339	1.56	800	3.7887	-13.4369	-11.9712	0.3538	-1.4657	-68.4765	-68.1755	-5.1745	-5.1745
3.5519	1.66	850	3.8024	-13.5450	-12.0641	0.3473	-1.4809	-68.7860	-68.5357	-5.1629	-5.1629
3.2271	1.76	900	3.8138	-13.6946	-12.2043	0.3495	-1.4903	-69.2534	-69.0344	-5.2650	-5.2650
3.2287	1.86	950	3.8140	-13.6951	-12.2047	0.3495	-1.4904	-69.2548	-69.0363	-5.2679	-5.2679
4.9599	1.95	1000	3.8137	-13.6944	-12.2044	0.3495	-1.4900	-69.2538	-69.0338	-5.2668	-5.2668

Framework versions

Transformers 4.38.2
Pytorch 2.0.0+cu117
Datasets 2.18.0
Tokenizers 0.15.2

tsavage68
/

mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for tsavage68/mistralit2_1000_STEPS_rate_1e5_03_Beta_DPO

Evaluation results