File size: 6,884 Bytes
cd709db
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
---
language: 
  - en
tags:
- QA
license: cc-by-4.0
datasets:
- BoolQ
- CommonSenseQA
- DROP
- DuoRC
- HellaSWAG
- HotpotQA
- HybridQA
- NarrativeQA
- NaturalQuestionsShort
- NewsQA
- QAMR
- RACE
- SearchQA
- SIQA
- SQuAD
- TriviaQA-web
metrics:
- Accuracy
- Precision
- Recall
- F1
- MRR
- R@3
- R@5


---

BERT for Sequence Classification trained on QA Dataset prediction task. 
- Input: question. 
- Output: dataset from where that question comes from.
Original paper: TWEAC: Transformer with Extendable QA Agent Classifiers
https://arxiv.org/abs/2104.07081

Datasets used for training:
```
list_datasets = ['BoolQ','CommonSenseQA','DROP','DuoRC','HellaSWAG','HotpotQA','HybridQA','NarrativeQA','NaturalQuestionsShort','NewsQA','QAMR','RACE','SearchQA','SIQA','SQuAD','TriviaQA-web']
```

Results for all datasets:
- Accuracy: 0.7919096825783123
- Precision: 0.731586272892176
- Recall: 0.7919096825783123
- F1: 0.7494425609552463
- MRR: 0.8720871733637521
- R@3: 0.9438690810655046
- R@5: 0.9745318608004427
- Queries/second: 6052.33538824659


Results per dataset:
```
"BoolQ": {
            "accuracy": 0.998776758409786,
            "mrr": 0.999388379204893,
            "r@3": 1.0,
            "r@5": 1.0,
            "query_per_second": 6978.947907596168,
            "precision": 0.8649364406779662,
            "recall": 0.998776758409786,
            "f1": 0.9270508089696281
        },
        "CommonSenseQA": {
            "accuracy": 0.9247135842880524,
            "mrr": 0.9476358338878795,
            "r@3": 0.9705400981996727,
            "r@5": 0.9705400981996727,
            "query_per_second": 5823.984138936813,
            "precision": 0.442443226311668,
            "recall": 0.9247135842880524,
            "f1": 0.5985169491525425
        },
        "DROP": {
            "accuracy": 0.9075083892617449,
            "mrr": 0.9378200367399193,
            "r@3": 0.9609899328859061,
            "r@5": 0.9786073825503355,
            "query_per_second": 6440.988897129248,
            "precision": 0.8636726546906187,
            "recall": 0.9075083892617449,
            "f1": 0.8850480670893842
        },
        "DuoRC": {
            "accuracy": 0.5555803405457654,
            "mrr": 0.7368963429107307,
            "r@3": 0.9092125808610305,
            "r@5": 0.9596996059186557,
            "query_per_second": 6853.643198794893,
            "precision": 0.646814404432133,
            "recall": 0.5555803405457654,
            "f1": 0.5977360905563778
        },
        "HellaSWAG": {
            "accuracy": 0.998406691894045,
            "mrr": 0.9990705702715262,
            "r@3": 1.0,
            "r@5": 1.0,
            "query_per_second": 3091.5012960785157,
            "precision": 0.9974134500596896,
            "recall": 0.998406691894045,
            "f1": 0.9979098238280083
        },
        "HotpotQA": {
            "accuracy": 0.7414435784479837,
            "mrr": 0.8435804344945315,
            "r@3": 0.9325652321247034,
            "r@5": 0.973568281938326,
            "query_per_second": 4972.668019223381,
            "precision": 0.7352150537634409,
            "recall": 0.7414435784479837,
            "f1": 0.7383161801923401
        },
        "HybridQA": {
            "accuracy": 0.7934218118869013,
            "mrr": 0.8806947764680021,
            "r@3": 0.964800923254472,
            "r@5": 0.9930755914598961,
            "query_per_second": 4886.494046259562,
            "precision": 0.7198952879581152,
            "recall": 0.7934218118869013,
            "f1": 0.7548723579467472
        },
        "NarrativeQA": {
            "accuracy": 0.5623756749076442,
            "mrr": 0.7416681781060867,
            "r@3": 0.9011082693947144,
            "r@5": 0.9580373212086767,
            "query_per_second": 7081.067049796865,
            "precision": 0.5623224095472628,
            "recall": 0.5623756749076442,
            "f1": 0.5623490409661377
        },
        "NaturalQuestionsShort": {
            "accuracy": 0.7985353692739171,
            "mrr": 0.8743599435345307,
            "r@3": 0.9439077594266126,
            "r@5": 0.9774072919912745,
            "query_per_second": 7136.590426649795,
            "precision": 0.7963020509633313,
            "recall": 0.7985353692739171,
            "f1": 0.7974171464135678
        },
        "NewsQA": {
            "accuracy": 0.5375118708452041,
            "mrr": 0.71192075967717,
            "r@3": 0.855650522317189,
            "r@5": 0.939696106362773,
            "query_per_second": 7193.851409052092,
            "precision": 0.18757249378624688,
            "recall": 0.5375118708452041,
            "f1": 0.2780985136961061
        },
        "QAMR": {
            "accuracy": 0.6658497602557272,
            "mrr": 0.7969741223377345,
            "r@3": 0.9207778369738945,
            "r@5": 0.973361747469366,
            "query_per_second": 7321.775044800525,
            "precision": 0.8654525309881587,
            "recall": 0.6658497602557272,
            "f1": 0.7526421968624852
        },
        "RACE": {
            "accuracy": 0.8771538617474154,
            "mrr": 0.917901778042666,
            "r@3": 0.9489154672613015,
            "r@5": 0.9693898236367322,
            "query_per_second": 6952.225120744351,
            "precision": 0.8767983789260385,
            "recall": 0.8771538617474154,
            "f1": 0.8769760843129306
        },
        "SearchQA": {
            "accuracy": 0.9762073027090695,
            "mrr": 0.9865069592101393,
            "r@3": 0.9972909305064782,
            "r@5": 0.9984687868080094,
            "query_per_second": 4031.0193826035634,
            "precision": 0.9870191735143503,
            "recall": 0.9762073027090695,
            "f1": 0.9815834665719192
        },
        "SIQA": {
            "accuracy": 0.9969293756397134,
            "mrr": 0.9977823268509042,
            "r@3": 0.9979529170931423,
            "r@5": 1.0,
            "query_per_second": 6711.547709005977,
            "precision": 0.9329501915708812,
            "recall": 0.9969293756397134,
            "f1": 0.9638792676892627
        },
        "SQuAD": {
            "accuracy": 0.550628092881614,
            "mrr": 0.7164538452390565,
            "r@3": 0.8660068519223448,
            "r@5": 0.9366197183098591,
            "query_per_second": 7033.420124363291,
            "precision": 0.48613678373382624,
            "recall": 0.550628092881614,
            "f1": 0.5163766175814368
        },
        "TriviaQA-web": {
            "accuracy": 0.7855124582584125,
            "mrr": 0.8647404868442627,
            "r@3": 0.9321859748266119,
            "r@5": 0.9640380169535063,
            "query_per_second": 4327.642440910395,
            "precision": 0.7404358353510896,
            "recall": 0.7855124582584125,
            "f1": 0.7623083634550667
        },
```