Trofish commited on
Commit
075db61
โ€ข
1 Parent(s): ba40edd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -40
README.md CHANGED
@@ -1,34 +1,27 @@
1
  2023 ์„ฑ๊ท ๊ด€๋Œ€ ํ•˜๊ณ„์ง‘์ค‘ ์‚ฐํ•™ํ˜‘๋ ฅํ”„๋กœ์ ํŠธ VAIV
2
- ### Github : https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM
3
-
4
  ## GPT ๊ธฐ๋ฐ˜์˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ (Friendly) ์œค๋ฆฌ์ ์ธ(Harmless) ์ผ์ƒ ๋Œ€ํ™”ํ˜• ์ฑ—๋ด‡ ๋ชจ๋ธ
 
5
 
6
- # ๊ณผ์ œ ๋ชฉํ‘œ
7
- GPT-NEOX ๊ธฐ๋ฐ˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์œค๋ฆฌ์ ์ธ ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜ ์ผ์ƒ ๋Œ€ํ™”ํ˜• ์ฑ—๋ด‡ ๋ชจ๋ธ ๊ตฌํ˜„
 
 
 
8
  - Self-Instruct: GPT4๋ฅผ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
9
  - RLHF(Reinforcement Learning from Human Feedback): ์‚ฌ๋žŒ์˜ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ˜์˜ํ•œ ๊ฐ•ํ™”ํ•™์Šต
10
  - DeepSpeed: ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ๊ธฐ์ˆ 
11
-
12
- # ๊ฐœ๋ฐœ ๋‚ด์šฉ
13
- Task 1: ๊ฐ•ํ™”ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
14
- Task 2: SFT ๋ชจ๋ธ Fine-tuning (https://huggingface.co/Trofish/KULLM-SFT-v2)
15
- Task 3: Reward ๋ชจ๋ธ ver1,2,3 ๊ตฌํ˜„
16
- Task 4: RLHF์™€ DeepSpeedChat์„ ํ†ตํ•œ ์ตœ์ข… ๋ชจ๋ธ ๊ตฌํ˜„ (https://huggingface.co/Trofish/KULLM-RLHF)
17
 
18
  # Task1. ๊ฐ•ํ™”ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
19
- ![image](https://github.com/VAIV-2023/VAIV2023/assets/79634774/a4988abd-c6fd-4fc2-8e53-9a02240e2275)
20
- ![image](https://github.com/VAIV-2023/VAIV2023/assets/79634774/dae49a1e-a834-463c-9f95-34cf254fdaeb)
21
- ## ๋ฐ์ดํ„ฐ์…‹ ์„ ์ • ์‹œ ๊ณ ๋ ค ์‚ฌํ•ญ
22
- - **์ผ์ƒ ๋Œ€ํ™”์™€ ํ˜์˜ค ํ‘œํ˜„ ๋Œ€์ฒ˜ ๋Šฅ๋ ฅ์„ ์˜ฌ๋ฆฌ๊ธฐ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ, ํ•™์Šต ์‹œ ์ฑ—๋ด‡ ๋ชจ๋ธ์˜ generalํ•œ task์— ๋Œ€ํ•œ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜๋Š” ๊ฒƒ์„ ๋ง‰๊ธฐ ์œ„ํ•ด์„œ general task ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์„ฑ**
23
-
24
- - **๊ตญ๋ฆฝ๊ตญ์–ด์› ์ผ์ƒ ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹:** ์ผ์ƒ์ ์ธ ๋Œ€ํ™”์— ๋Œ€ํ•œ ์ž์—ฐ์Šค๋Ÿฌ์šด ์‘๋‹ต์ด ์žˆ์œผ๋ฉด์„œ๋„, ๋งž์ถค๋ฒ•์ด ์ž˜ ์ง€์ผœ์ง€๊ณ  ์€์–ด, ๋น„๋ฌธ, ์ดˆ์„ฑ ๋“ฑ์ด ์—†์œผ๋ฉฐ ์ฃผ์ œ๋ณ„๋กœ ๋‹ค์–‘ํ•œ ๋Œ€ํ™”๊ฐ€ ์žˆ์Œ
25
-
26
- - **AI Hub ํ˜์˜ค ํ‘œํ˜„ ๋ฐ์ดํ„ฐ์…‹:** ํ˜์˜ค, ์ฐจ๋ณ„, ์„ฑ์ ์ธ ๋‚ด์šฉ, ํญ๋ ฅ, ๋ฒ”์ฃ„ ๋“ฑ ์นดํ…Œ๊ณ ๋ฆฌ๋ณ„๋กœ ๋‹ค์–‘ํ•œ ํ˜์˜ค ํ‘œํ˜„์ด ์žˆ์Œ
27
-
28
- - **General task ๋ฐ์ดํ„ฐ์…‹**
29
- - Evol-Instruct ๋ฐ์ดํ„ฐ์…‹: ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์— ๋Œ€ํ•œ ๋ณต์žกํ•˜๊ณ  ๋…ผ๋ฆฌ์ ์ธ prompt์™€ ๋‹ต๋ณ€์ด ์žˆ์Œ
30
- - Self-Instruct ๋ฐ์ดํ„ฐ์…‹: ์‚ฌ๋žŒ์ด ์ง์ ‘ ์ƒ์„ฑํ•œ ์–‘์งˆ์˜ Seed data๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
31
- - RLHF ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ ๋ฐ์ดํ„ฐ์…‹: DeepSpeedChat์—์„œ ๊ณต๊ฐœํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ํ•œ๊ตญ์–ด๋กœ ๋ฒˆ์—ญ
32
 
33
  # Task2. SFT ๋ชจ๋ธ Fine-tuning
34
  ## Baseline Model
@@ -46,8 +39,6 @@
46
  ![image](https://github.com/VAIV-2023/VAIV2023/assets/79634774/a994a960-db7c-4e75-a11a-d7755d372722)
47
  * G-Eval: https://arxiv.org/abs/2303.16634
48
 
49
- ## Final SFT Model
50
- - https://huggingface.co/Trofish/KULLM-SFT-v2
51
 
52
  # Task3-1. Reward Model ver1 ๊ตฌํ˜„
53
  ## Baseline Model
@@ -64,33 +55,35 @@
64
  - ๋ฐ์ดํ„ฐ์…‹ ์œ ํ˜•๋ณ„๋กœ G-Eval ํ‰๊ฐ€ Prompt์— ์ฐจ์ด๋ฅผ ๋‘์—ˆ์Œ
65
  - ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/7d7117d0-02e9-42dd-8ce3-5244cf726bf8)
66
  ## Reward v1 Model Finetuning
67
- - ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/da4d9b15-ec91-44bb-84d9-f28aeffd16ad)
68
  - InstructGPT ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด, Reward ๋ชจ๋ธ์€ overfitting๋˜๋ฉด ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋œ๋‹ค๊ณ  ํ•จ --> epoch ์ˆ˜๋ฅผ 1๋กœ ์„ค์ •
69
  - batch size๋‚˜ learning rate ๋“ฑ ๋‹ค๋ฅธ hyper-parameter๋Š” ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์ด ์—†๋‹ค๊ณ  ํ•จ
70
  - Colab A100 40GB ๊ธฐ์ค€ ์ด ํ•™์Šต ์‹œ๊ฐ„ 4๋ถ„
71
 
72
  ## Reward v1 Model Evaluation
73
- - ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/f4af0b7d-af47-4881-8adf-d14be43c0eb1)
 
74
  - Reward Model Template
75
- - **"์•„๋ž˜๋Š” ์ž‘์—…์„ ์„ค๋ช…ํ•˜๋Š” ๋ช…๋ น์–ด์ž…๋‹ˆ๋‹ค. ์š”์ฒญ์„ ์ ์ ˆํžˆ ์™„๋ฃŒํ•˜๋Š” ์‘๋‹ต์„ ์ž‘์„ฑํ•˜์„ธ์š”. \n\n ### ๋ช…๋ น์–ด:\n{prompt}\n\n ### ์‘๋‹ต:\n"**
76
 
77
- # Task3-2. Reward Model ver2,3 ๊ตฌํ˜„
78
- ## RewardModel ver1 Issues
79
- - ๊ตฌํ˜„๋œ Reward ๋ชจ๋ธ์˜ ์„ฑ๏ฟฝ๏ฟฝ๏ฟฝ์ด ์ข‹์ง€ ์•Š์Œ (Accuracy 0.65)
80
- - Reward ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ Step3 ํ•™์Šต์‹œ ํ˜์˜คํ‘œํ˜„์ด ์•„๋‹Œ๋ฐ๋„ ํ˜์˜คํ‘œํ˜„์ด๋ผ๊ณ  ์ธ์‹ํ•˜๊ณ  ๋‹ต๋ณ€ํ•˜๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ
81
 
82
- ## Issue ํ•ด๊ฒฐ๋ฐฉ์•ˆ (Reward Model ver2,3)
83
- - ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/99c7fd6c-448e-4780-9573-0ef51b8e3183)
 
84
  - General Task ๋‹ต๋ณ€์— ๋Œ€ํ•œ ํ‰๊ฐ€ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด Evol-instruct ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
85
- - SFT ๋ชจ๋ธ๋กœ ๋‹ต๋ณ€์„ 2๊ฐœ ์ƒ์„ฑํ•˜์˜€์„ ๋•Œ, Chosen, Rejected ๋‹ต๋ณ€์˜ ์ฐจ์ด๊ฐ€ ํฌ๊ฒŒ ์—†์–ด ๋ชจ๋ธ์ด ํ•™์Šต๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ 2๊ฐœ์˜ ๋ชจ๋ธ **(ChatGPT, SFT)**๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ƒ์„ฑ
86
- - ํ˜์˜คํ‘œํ˜„ ํ•™์Šต์‹œ(Ver2) Step3 ํ•™์Šต ์ดํ›„์— ๋‹ต๋ณ€์ด ์ด์ƒํ•˜๊ฒŒ ์ƒ์„ฑ๋˜๋Š” Issue๊ฐ€ ์žˆ์–ด, ํ˜์˜คํ‘œํ˜„์„ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํ•™์Šต(Ver3)
87
  - RM-ver1์€ GPT4๊ฐ€ Chosen, Rejected ๋ ˆ์ด๋ธ”๋ง์„ ์ง„ํ–‰ํ•˜์˜€์ง€๋งŒ, Resource ์ด์Šˆ๋กœ ์ธํ•ด ์ผ๋ถ€๋งŒ ์‚ฌ๋žŒ์ด ๋ผ๋ฒจ๋ง ์ง„ํ–‰
88
- - ์ผ์ƒ๋Œ€ํ™”, ํ˜์˜คํ‘œํ˜„ ๋ฐ์ดํ„ฐ์…‹
89
  - ChatGPT์™€ SFT ๋ชจ๋‘ ์ผ๊ด€๋˜๊ฒŒ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜์ง€ ์•Š์•„, ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋ผ๋ฒจ๋ง ์ง„ํ–‰
90
  - RLHF ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ, Evol-Instruct ๋ฐ์ดํ„ฐ์…‹
91
- - ChatGPT๊ฐ€ ์ผ๊ด€๋˜๊ฒŒ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜์—ฌ ChatGPT๋ฅผ Chosen, SFT๋ฅผ Rejected๋กœ ๋ผ๋ฒจ๋ง ์ง„
92
- ## Reward Model ver2,3 Evaluation
93
- ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/7889398a-86dc-4b03-8300-64b772d49887)
94
 
95
  # Task4. RLHF์™€ DeepSpeedChat์„ ํ†ตํ•œ ์ตœ์ข… ๋ชจ๋ธ ๊ตฌํ˜„
96
  - Microsoft์—์„œ ๋งŒ๋“  ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ๊ธฐ์ˆ (DeepSpeed)์„ RLHF Process์— ์ ์šฉํ•œ DeepSpeedChat ์‚ฌ์šฉ
 
1
  2023 ์„ฑ๊ท ๊ด€๋Œ€ ํ•˜๊ณ„์ง‘์ค‘ ์‚ฐํ•™ํ˜‘๋ ฅํ”„๋กœ์ ํŠธ VAIV
 
 
2
  ## GPT ๊ธฐ๋ฐ˜์˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ (Friendly) ์œค๋ฆฌ์ ์ธ(Harmless) ์ผ์ƒ ๋Œ€ํ™”ํ˜• ์ฑ—๋ด‡ ๋ชจ๋ธ
3
+ ### Github : https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM
4
 
5
+ # ์—ฐ๊ตฌ ๋ฐฐ๊ฒฝ ๋ฐ ๋ชฉ์ 
6
+ GPT-NEOX(Polyglot-ko) ๊ธฐ๋ฐ˜ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์œค๋ฆฌ์ ์ธ ํ•œ๊ตญ์–ด ๊ธฐ๋ฐ˜ ์ผ์ƒ ๋Œ€ํ™”ํ˜• ์ฑ—๋ด‡ ๋ชจ๋ธ ๊ตฌํ˜„
7
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/18bb1ab4-8924-4b43-b538-1e6529297217)
8
+
9
+ # ๊ฐœ๋ฐœ ๋‚ด์šฉ
10
  - Self-Instruct: GPT4๋ฅผ ์ด์šฉํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•
11
  - RLHF(Reinforcement Learning from Human Feedback): ์‚ฌ๋žŒ์˜ ์„ ํ˜ธ๋„๋ฅผ ๋ฐ˜์˜ํ•œ ๊ฐ•ํ™”ํ•™์Šต
12
  - DeepSpeed: ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ๊ธฐ์ˆ 
13
+
14
+ - Task 1: ๊ฐ•ํ™”ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
15
+ - Task 2: SFT ๋ชจ๋ธ Instruction-tuning
16
+ - Task 3: Reward ๋ชจ๋ธ ver1,2,3 ๊ตฌํ˜„
17
+ - Task 4: RLHF์™€ DeepSpeedChat์„ ํ†ตํ•œ ์ตœ์ข… ๋ชจ๋ธ ๊ตฌํ˜„ (https://huggingface.co/Trofish/KULLM-RLHF)
 
18
 
19
  # Task1. ๊ฐ•ํ™”ํ•™์Šต ๋‹จ๊ณ„๋ณ„ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•
20
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/4bb56e36-0c49-4d15-a2c6-2824867419a8)
21
+ ![Screenshot 2024-06-18 at 11 05 55โ€ฏAM](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/2f637065-fa25-4402-b319-113ff4c6e1a9)
22
+ ![Screenshot 2024-06-18 at 11 06 08โ€ฏAM](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/2a6c2e9b-1292-43b9-b5e7-5ced3643988d)
23
+
24
+
 
 
 
 
 
 
 
 
25
 
26
  # Task2. SFT ๋ชจ๋ธ Fine-tuning
27
  ## Baseline Model
 
39
  ![image](https://github.com/VAIV-2023/VAIV2023/assets/79634774/a994a960-db7c-4e75-a11a-d7755d372722)
40
  * G-Eval: https://arxiv.org/abs/2303.16634
41
 
 
 
42
 
43
  # Task3-1. Reward Model ver1 ๊ตฌํ˜„
44
  ## Baseline Model
 
55
  - ๋ฐ์ดํ„ฐ์…‹ ์œ ํ˜•๋ณ„๋กœ G-Eval ํ‰๊ฐ€ Prompt์— ์ฐจ์ด๋ฅผ ๋‘์—ˆ์Œ
56
  - ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/7d7117d0-02e9-42dd-8ce3-5244cf726bf8)
57
  ## Reward v1 Model Finetuning
58
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/da4d9b15-ec91-44bb-84d9-f28aeffd16ad)
59
  - InstructGPT ๋…ผ๋ฌธ์— ๋”ฐ๋ฅด๋ฉด, Reward ๋ชจ๋ธ์€ overfitting๋˜๋ฉด ์„ฑ๋Šฅ์ด ํฌ๊ฒŒ ์ €ํ•˜๋œ๋‹ค๊ณ  ํ•จ --> epoch ์ˆ˜๋ฅผ 1๋กœ ์„ค์ •
60
  - batch size๋‚˜ learning rate ๋“ฑ ๋‹ค๋ฅธ hyper-parameter๋Š” ์„ฑ๋Šฅ์— ํฐ ์˜ํ–ฅ์ด ์—†๋‹ค๊ณ  ํ•จ
61
  - Colab A100 40GB ๊ธฐ์ค€ ์ด ํ•™์Šต ์‹œ๊ฐ„ 4๋ถ„
62
 
63
  ## Reward v1 Model Evaluation
64
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/c21be612-b26d-4a1c-a1e2-6a99442660da)
65
+
66
  - Reward Model Template
67
+ - "์•„๋ž˜๋Š” ์ž‘์—…์„ ์„ค๋ช…ํ•˜๋Š” ๋ช…๋ น์–ด์ž…๋‹ˆ๋‹ค. ์š”์ฒญ์„ ์ ์ ˆํžˆ ์™„๋ฃŒํ•˜๋Š” ์‘๋‹ต์„ ์ž‘์„ฑํ•˜์„ธ์š”. \n\n ### ๋ช…๋ น์–ด:\n{prompt}\n\n ### ์‘๋‹ต:\n"
68
 
69
+ # Task3-2. Reward Model ver2 ๊ตฌํ˜„
70
+ ## Reward Model ver1 Issues
71
+ - ๊ตฌํ˜„๋œ Reward Model์˜ ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์Œ (Accuracy 0.65)
72
+ - Reward Model ver1์„ ์‚ฌ์šฉํ•˜์—ฌ Step3 ํ•™์Šต์‹œ ํ˜์˜คํ‘œํ˜„์ด ์•„๋‹Œ๋ฐ๋„ ํ˜์˜คํ‘œํ˜„์ด๋ผ๊ณ  ์ธ์‹ํ•˜๊ณ  ๋‹ต๋ณ€ํ•˜๋Š” ๋ฌธ์ œ ๋ฐœ์ƒ
73
 
74
+ ## Issue ํ•ด๊ฒฐ๋ฐฉ์•ˆ
75
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/6f4f0665-a8c7-4903-a626-f37018b7e4c9)
76
+ - SFT ๋ชจ๋ธ๋กœ ๋‹ต๋ณ€์„ 2๊ฐœ ์ƒ์„ฑํ•˜์˜€์„ ๋•Œ(Ver1), Chosen, Rejected ๋‹ต๋ณ€์˜ ์ฐจ์ด๊ฐ€ ํฌ๊ฒŒ ์—†์–ด ๋ชจ๋ธ์ด ํ•™์Šต๋˜์ง€ ์•Š๋Š” ํ˜„์ƒ์„ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ 2๊ฐœ์˜ ๋ชจ๋ธ **(ChatGPT, SFT)**๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ƒ์„ฑ(Ver2)
77
  - General Task ๋‹ต๋ณ€์— ๋Œ€ํ•œ ํ‰๊ฐ€ ์„ฑ๋Šฅ์„ ๋†’์ด๊ธฐ ์œ„ํ•ด Evol-instruct ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€
78
+ - ํ•™์Šต์— ์‚ฌ์šฉํ•œ ๋ชจ๋“  ๋ฐ์ดํ„ฐ์…‹์€ 15 token ์ดํ•˜, cosine ์œ ์‚ฌ๋„ 0.5 ์ด์ƒ์ผ ๊ฒฝ์šฐ ์ œ๊ฑฐํ•˜๋Š” Filtering ์ž‘์—… ์ˆ˜ํ–‰
79
+ - ํ˜์˜คํ‘œํ˜„ ํ•™์Šต์‹œ(Ver1) Step3 ๊ฐ•ํ™”ํ•™์Šต ์ดํ›„์— ๋‹ต๋ณ€์ด ์ด์ƒํ•˜๊ฒŒ ์ƒ์„ฑ๋˜๋Š” Issue๊ฐ€ ์žˆ์–ด, ํ˜์˜คํ‘œํ˜„์„ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ฑฐํ•˜๊ณ  ํ•™์Šต(Ver2)
80
  - RM-ver1์€ GPT4๊ฐ€ Chosen, Rejected ๋ ˆ์ด๋ธ”๋ง์„ ์ง„ํ–‰ํ•˜์˜€์ง€๋งŒ, Resource ์ด์Šˆ๋กœ ์ธํ•ด ์ผ๋ถ€๋งŒ ์‚ฌ๋žŒ์ด ๋ผ๋ฒจ๋ง ์ง„ํ–‰
81
+ - ์ผ์ƒ๋Œ€ํ™” ๋ฐ์ดํ„ฐ์…‹
82
  - ChatGPT์™€ SFT ๋ชจ๋‘ ์ผ๊ด€๋˜๊ฒŒ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜์ง€ ์•Š์•„, ์‚ฌ๋žŒ์ด ์ง์ ‘ ๋ผ๋ฒจ๋ง ์ง„ํ–‰
83
  - RLHF ํ•œ๊ตญ์–ด ๋ฒˆ์—ญ, Evol-Instruct ๋ฐ์ดํ„ฐ์…‹
84
+ - ChatGPT๊ฐ€ ์ผ๊ด€๋˜๊ฒŒ ๋†’์€ ํ€„๋ฆฌํ‹ฐ์˜ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜์—ฌ ChatGPT๋ฅผ Chosen, SFT๋ฅผ Rejected๋กœ ๋ผ๋ฒจ๋ง ์ง„ํ–‰
85
+ ## Reward Model ver2 Evaluation
86
+ ![image](https://github.com/VAIV-2023/RLHF-Korean-Friendly-LLM/assets/79634774/834cb645-7909-464b-b072-635aaac8eeff)
87
 
88
  # Task4. RLHF์™€ DeepSpeedChat์„ ํ†ตํ•œ ์ตœ์ข… ๋ชจ๋ธ ๊ตฌํ˜„
89
  - Microsoft์—์„œ ๋งŒ๋“  ๋Œ€๊ทœ๋ชจ ๋ถ„์‚ฐ ๋”ฅ๋Ÿฌ๋‹์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ๋ฉ”๋ชจ๋ฆฌ ์ตœ์ ํ™” ๊ธฐ์ˆ (DeepSpeed)์„ RLHF Process์— ์ ์šฉํ•œ DeepSpeedChat ์‚ฌ์šฉ