--- license: cc-by-nc-nd-4.0 datasets: - openslr - mozilla-foundation/common_voice_13_0 language: - gl pipeline_tag: automatic-speech-recognition tags: - ITG - PyTorch - Transformers - whisper - whisper-small --- # Whisper Small Galician ## Description This is a fine-tuned version of the [openai/whisper-small](https://huggingface.co./openai/whisper-small) pre-trained model for ASR in galician. --- ## Dataset We used two datasets combined: 1. The [OpenSLR galician](https://huggingface.co./datasets/openslr/viewer/SLR77) dataset, available in the openslr repository. 2. The [Common Voice 13 galician](https://huggingface.co./datasets/mozilla-foundation/common_voice_13_0/viewer/gl) dataset, available in the Common Voice repository. --- ## Example inference script ### Check this example script to run our model in inference mode ```python import torch from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq filename = "demo.wav" #change this line to the name of your audio file sample_rate = 16_000 processor = AutoProcessor.from_pretrained('ITG/whisper-small-gl') model = AutoModelForSpeechSeq2Seq.from_pretrained('ITG/whisper-small-gl') device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') model.to(device) with torch.no_grad(): speech_array, _ = librosa.load(filename, sr=sample_rate) inputs = processor(speech_array, sampling_rate=sample_rate, return_tensors="pt").to(device) input_features = inputs.input_features generated_ids = model.generate(inputs=input_features, max_length=225) decode_output = processor.batch_decode(generated_ids, skip_special_tokens=True)[0] print(f"ASR Galician whisper-small output: {decode_output}") ``` --- ## Fine-tuning hyper-parameters | **Hyper-parameter** | **Value** | |:----------------------------------------:|:---------------------------:| | Training batch size | 16 | | Evaluation batch size | 8 | | Learning rate | 1e-5 | | Gradient checkpointing | true | | Gradient accumulation steps | 1 | | Max training epochs | 100 | | Max steps | 4000 | | Generate max length | 225 | | Warmup training steps (%) | 12,5% | | FP16 | true | | Metric for best model | wer | | Greater is better | false | ## Fine-tuning in a different dataset or style If you're interested in fine-tuning your own whisper model, we suggest starting with the [openai/whisper-small model](https://huggingface.co./openai/whisper-small). Additionally, you may find the Transformers step-by-step guide for [fine-tuning whisper on multilingual ASR datasets](https://huggingface.co./blog/fine-tune-whisper) to be a valuable resource. This guide served as a helpful reference during the training process of this Galician whisper-small model!