File size: 5,687 Bytes
73cb91b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
license: apache-2.0
base_model: teknium/OpenHermes-2.5-Mistral-7B
tags:
- generated_from_trainer
model-index:
- name: openhermes-mistral-2.5-7b-dpo-test
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# openhermes-mistral-2.5-7b-dpo-test

This model is a fine-tuned version of [teknium/OpenHermes-2.5-Mistral-7B](https://huggingface.co./teknium/OpenHermes-2.5-Mistral-7B) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 0.4487
- Rewards/chosen: -0.2951
- Rewards/rejected: -2.2421
- Rewards/accuracies: 0.875
- Rewards/margins: 1.9470
- Logps/rejected: -257.4751
- Logps/chosen: -204.3027
- Logits/rejected: -3.0752
- Logits/chosen: -3.0485

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 1
- eval_batch_size: 8
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 2
- training_steps: 200

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|:-------------:|:-----:|:----:|:---------------:|:--------------:|:----------------:|:------------------:|:---------------:|:--------------:|:------------:|:---------------:|:-------------:|
| 0.1645        | 0.01  | 10   | 0.5339          | 0.3993         | -0.1483          | 0.6875             | 0.5476          | -236.5374      | -197.3593    | -3.1575         | -3.1872       |
| 0.0519        | 0.01  | 20   | 0.5521          | 0.2239         | -0.4486          | 0.625              | 0.6725          | -239.5405      | -199.1127    | -3.1969         | -3.2456       |
| 0.1618        | 0.01  | 30   | 0.5866          | -0.0538        | -0.8893          | 0.5625             | 0.8355          | -243.9472      | -201.8902    | -3.2286         | -3.2525       |
| 0.1752        | 0.02  | 40   | 0.5943          | -0.2184        | -1.2057          | 0.5                | 0.9873          | -247.1112      | -203.5360    | -3.2201         | -3.2477       |
| 0.3811        | 0.03  | 50   | 0.6973          | -0.6180        | -1.8146          | 0.5                | 1.1966          | -253.2001      | -207.5316    | -3.1943         | -3.2034       |
| 1.158         | 0.03  | 60   | 0.6347          | -0.4710        | -1.7363          | 0.5625             | 1.2653          | -252.4173      | -206.0622    | -3.1655         | -3.1197       |
| 0.8751        | 0.04  | 70   | 0.6103          | -0.4061        | -1.5966          | 0.5625             | 1.1905          | -251.0201      | -205.4132    | -3.1360         | -3.0544       |
| 0.7811        | 0.04  | 80   | 0.6405          | -0.4774        | -1.6574          | 0.5625             | 1.1799          | -251.6278      | -206.1260    | -3.1337         | -3.0492       |
| 1.4305        | 0.04  | 90   | 0.6257          | -0.4784        | -1.6184          | 0.5625             | 1.1399          | -251.2379      | -206.1361    | -3.1251         | -3.0489       |
| 0.5478        | 0.05  | 100  | 0.6191          | -0.5317        | -1.7067          | 0.5625             | 1.1750          | -252.1214      | -206.6691    | -3.1207         | -3.0753       |
| 0.6344        | 0.06  | 110  | 0.5691          | -0.4827        | -1.7734          | 0.5625             | 1.2907          | -252.7882      | -206.1789    | -3.1075         | -3.0806       |
| 0.5405        | 0.06  | 120  | 0.5337          | -0.4681        | -2.1739          | 0.8125             | 1.7058          | -256.7935      | -206.0332    | -3.1124         | -3.0733       |
| 0.7848        | 0.07  | 130  | 0.5390          | -0.5288        | -2.3789          | 0.8125             | 1.8501          | -258.8436      | -206.6404    | -3.1019         | -3.0628       |
| 1.3119        | 0.07  | 140  | 0.4753          | -0.3276        | -2.0907          | 0.875              | 1.7631          | -255.9614      | -204.6279    | -3.0904         | -3.0648       |
| 0.3636        | 0.07  | 150  | 0.4555          | -0.2566        | -2.0064          | 0.625              | 1.7498          | -255.1179      | -203.9175    | -3.0804         | -3.0640       |
| 0.427         | 0.08  | 160  | 0.4614          | -0.2900        | -2.0804          | 0.625              | 1.7904          | -255.8585      | -204.2518    | -3.0721         | -3.0518       |
| 0.8971        | 0.09  | 170  | 0.4629          | -0.3117        | -2.1791          | 0.875              | 1.8673          | -256.8448      | -204.4694    | -3.0711         | -3.0468       |
| 0.6219        | 0.09  | 180  | 0.4560          | -0.3042        | -2.2114          | 0.875              | 1.9073          | -257.1686      | -204.3934    | -3.0743         | -3.0485       |
| 0.7551        | 0.1   | 190  | 0.4520          | -0.3007        | -2.2400          | 0.875              | 1.9392          | -257.4540      | -204.3593    | -3.0755         | -3.0481       |
| 1.0917        | 0.1   | 200  | 0.4487          | -0.2951        | -2.2421          | 0.875              | 1.9470          | -257.4751      | -204.3027    | -3.0752         | -3.0485       |


### Framework versions

- Transformers 4.34.1
- Pytorch 2.1.0+cu121
- Datasets 2.14.6
- Tokenizers 0.14.1