MayFarhat commited on
Commit
a5cc9a0
1 Parent(s): 607e54d

Initial commit

Browse files
Files changed (4) hide show
  1. README.md +42 -3
  2. config.json +16 -0
  3. pytorch_model.bin +3 -0
  4. tokenizer.json +0 -0
README.md CHANGED
@@ -1,3 +1,42 @@
1
- ---
2
- license: odc-by
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - arabic
4
+ - text-generation
5
+ - language-model
6
+ license: apache-2.0
7
+ ---
8
+
9
+ # Model summary
10
+
11
+ This model is trained on the ArabicWeb dataset V1. It was trained on 25B tokens using the [AraGPT-2](https://huggingface.co/aubmindlab/aragpt2-base) tokenizer. The model has 900 million parameters with a context length of 1024 tokens and uses the Mamba2 architecture.
12
+ * License: Apache-2
13
+ * Languages: Arabic
14
+
15
+ ## Model Description
16
+
17
+ The ArabicWeb Ablation Model V1 is trained on a diverse corpus of Arabic text, including news articles, art and entertainment, and encyclopedia entries. This makes it suitable for a variety of Arabic text generation tasks. For more details, you can read the blog post.
18
+
19
+ - **Model Type**: Language Model
20
+ - **Architecture**: Mamba
21
+ - **Training Data**: ArabicWeb dataset
22
+ - **Training Objective**: Text generation
23
+
24
+ ## Usage
25
+
26
+ This model was primarily trained to assess the quality of the ArabicWeb dataset and is designed for text generation in Arabic. Please note that this is an ablation model that was not instruction-tuned. The primary intended use case is to compare its performance with other models trained under the same configuration but with different versions of datasets.
27
+
28
+ ## Training
29
+ ### Model
30
+
31
+ * Architecture: Mamba2 model
32
+ * Pretraining tokens: 25B
33
+ * Scheduler: Cosine
34
+ * d_model: 2304
35
+ * d_intermediate: 0
36
+ * n_layer: 18
37
+
38
+ ### Hardware
39
+ * Platform: HPE Cray node
40
+ * Hardware: 8 NVIDIA H100 GPUs
41
+ * Cloud Provider: Orange Cloud Avenue
42
+
config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "d_model": 2304,
3
+ "d_intermediate": 0,
4
+ "n_layer": 18,
5
+ "vocab_size": 64000,
6
+ "ssm_cfg": {
7
+ "layer": "Mamba2"
8
+ },
9
+ "attn_layer_idx": [],
10
+ "attn_cfg": {},
11
+ "rms_norm": true,
12
+ "residual_in_fp32": true,
13
+ "fused_add_norm": true,
14
+ "pad_vocab_size_multiple": 16,
15
+ "tie_embeddings": false
16
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5bb4048359371c16336e3c3af8a1cdac3ba2e826cdf70aa14673967fa6d87703
3
+ size 3529626334
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff