ashaduzzaman commited on
Commit
3b7aa24
1 Parent(s): 7b0861a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -31
README.md CHANGED
@@ -8,58 +8,118 @@ metrics:
8
  model-index:
9
  - name: vit-finetuned-food101
10
  results: []
 
 
 
11
  ---
12
 
13
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
14
  should probably proofread and complete it, then remove this comment. -->
15
 
16
- # vit-finetuned-food101
17
 
18
- This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on an unknown dataset.
19
- It achieves the following results on the evaluation set:
20
- - Loss: 1.6262
21
- - Accuracy: 0.896
22
 
23
- ## Model description
24
 
25
- More information needed
26
 
27
- ## Intended uses & limitations
 
 
 
 
28
 
29
- More information needed
30
 
31
- ## Training and evaluation data
 
 
32
 
33
- More information needed
34
 
35
- ## Training procedure
 
36
 
37
- ### Training hyperparameters
 
 
 
38
 
39
- The following hyperparameters were used during training:
40
- - learning_rate: 5e-05
41
- - train_batch_size: 16
42
- - eval_batch_size: 16
43
- - seed: 42
44
- - gradient_accumulation_steps: 4
45
- - total_train_batch_size: 64
46
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
47
- - lr_scheduler_type: linear
48
- - lr_scheduler_warmup_ratio: 0.1
49
- - num_epochs: 3
50
 
51
- ### Training results
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
52
 
53
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
54
- |:-------------:|:-----:|:----:|:---------------:|:--------:|
55
  | 2.7649 | 0.992 | 62 | 2.5733 | 0.831 |
56
  | 1.888 | 2.0 | 125 | 1.7770 | 0.883 |
57
  | 1.6461 | 2.976 | 186 | 1.6262 | 0.896 |
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
- ### Framework versions
61
 
62
- - Transformers 4.42.4
63
- - Pytorch 2.4.0+cu121
64
- - Datasets 2.21.0
65
- - Tokenizers 0.19.1
 
 
 
 
 
8
  model-index:
9
  - name: vit-finetuned-food101
10
  results: []
11
+ datasets:
12
+ - ethz/food101
13
+ pipeline_tag: image-classification
14
  ---
15
 
16
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
17
  should probably proofread and complete it, then remove this comment. -->
18
 
 
19
 
20
+ # Model Card: ViT Fine-tuned on Food-101
 
 
 
21
 
22
+ ## Model Overview
23
 
24
+ This model is a fine-tuned version of [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k) on the Food-101 dataset. The Vision Transformer (ViT) architecture is leveraged for image classification tasks, particularly for recognizing and categorizing food items.
25
 
26
+ ### Model Details
27
+ - **Model Type**: Vision Transformer (ViT)
28
+ - **Base Model**: [google/vit-base-patch16-224-in21k](https://huggingface.co/google/vit-base-patch16-224-in21k)
29
+ - **Fine-tuning Dataset**: Food-101
30
+ - **Number of Labels**: 101 (corresponding to different food categories)
31
 
32
+ ## Performance
33
 
34
+ The model achieves the following results on the evaluation set:
35
+ - **Loss**: 1.6262
36
+ - **Accuracy**: 89.6%
37
 
38
+ ## Intended Uses & Limitations
39
 
40
+ ### Intended Use Cases
41
+ - **Image Classification**: This model is designed for classifying images into one of 101 food categories, making it suitable for applications like food recognition in diet tracking, restaurant menu analysis, or food-related search engines.
42
 
43
+ ### Limitations
44
+ - **Dataset Bias**: The model's performance may degrade when applied to food images that are significantly different from those in the Food-101 dataset, such as non-Western cuisines or images captured in non-standard conditions.
45
+ - **Generalization**: While the model performs well on the Food-101 dataset, its ability to generalize to other food-related tasks or datasets is not guaranteed.
46
+ - **Input Size**: The model expects input images of size 224x224 pixels. Images of different sizes should be resized accordingly.
47
 
48
+ ## Training and Evaluation Data
 
 
 
 
 
 
 
 
 
 
49
 
50
+ The model was fine-tuned on the Food-101 dataset, which consists of 101,000 images across 101 different food categories. Each category contains 1,000 images, with 750 used for training and 250 for testing. The dataset includes diverse food items but may be skewed towards certain cuisines or food types.
51
+
52
+ ## Training Procedure
53
+
54
+ ### Training Hyperparameters
55
+
56
+ The model was fine-tuned using the following hyperparameters:
57
+ - **Learning Rate**: 5e-05
58
+ - **Train Batch Size**: 16
59
+ - **Eval Batch Size**: 16
60
+ - **Seed**: 42
61
+ - **Gradient Accumulation Steps**: 4
62
+ - **Total Train Batch Size**: 64
63
+ - **Optimizer**: Adam with betas=(0.9, 0.999) and epsilon=1e-08
64
+ - **Learning Rate Scheduler**: Linear with a warmup ratio of 0.1
65
+ - **Number of Epochs**: 3
66
+
67
+ ### Training Results
68
 
69
  | Training Loss | Epoch | Step | Validation Loss | Accuracy |
70
+ |---------------|-------|------|-----------------|----------|
71
  | 2.7649 | 0.992 | 62 | 2.5733 | 0.831 |
72
  | 1.888 | 2.0 | 125 | 1.7770 | 0.883 |
73
  | 1.6461 | 2.976 | 186 | 1.6262 | 0.896 |
74
 
75
+ ### Framework Versions
76
+ - **Transformers**: 4.42.4
77
+ - **PyTorch**: 2.4.0+cu121
78
+ - **Datasets**: 2.21.0
79
+ - **Tokenizers**: 0.19.1
80
+
81
+ ## Inference Example
82
+
83
+ To run inference using this model, you can load an image from the Food-101 dataset and classify it as follows:
84
+
85
+ ```python
86
+ from datasets import load_dataset
87
+ from transformers import pipeline
88
+ from PIL import Image
89
+ import requests
90
+ from io import BytesIO
91
+
92
+ # Load a sample image from the internet
93
+ image_url = "https://example.com/path-to-your-image.jpg" # Replace with your image URL
94
+ response = requests.get(image_url)
95
+ image = Image.open(BytesIO(response.content))
96
+
97
+ # Load the fine-tuned model for image classification
98
+ classifier = pipeline(
99
+ "image-classification",
100
+ model="ashaduzzaman/vit-finetuned-food101"
101
+ )
102
+
103
+ # Run inference
104
+ result = classifier(image)
105
+ print(result)
106
+
107
+ ```
108
+
109
+ ## Ethical Considerations
110
+
111
+ - **Bias**: The Food-101 dataset primarily consists of popular Western dishes, which may introduce bias in the model’s predictions for non-Western food items.
112
+ - **Privacy**: When using this model in applications, ensure that the images are sourced ethically and that privacy considerations are respected.
113
+
114
+ ## Citation
115
 
116
+ If you use this model in your work, please cite it as:
117
 
118
+ ```
119
+ @misc{vit_finetuned_food101,
120
+ author = {A. Shaduzzaman},
121
+ title = {ViT Fine-tuned on Food-101},
122
+ year = {2024},
123
+ url = {https://huggingface.co/ashaduzzaman/vit-finetuned-food101},
124
+ }
125
+ ```