Edit model card

Whisper Small Personal

This model is a fine-tuned version of openai/whisper-small trained on personal speech recordings using Mimic Record Studio. It is designed for automatic speech recognition (ASR) tasks and achieves a low Word Error Rate (WER) on the custom dataset.

Model Details

  • Model Type: Whisper Small
  • Training Dataset: Personal recordings using Mimic Record Studio
  • Framework: PyTorch
  • Language: Primarily fine-tuned on [insert language(s)]
  • Batch Size: 16 (with gradient accumulation)
  • Learning Rate: 1e-5
  • Mixed Precision: FP16
  • Evaluation Strategy: Steps (every 1000 steps)
  • WER on Validation Set:
    • Step 2000: 0.079441

Hyperparameters

  • Max Training Steps: 4000
  • Warmup Steps: 500
  • Gradient Checkpointing: Enabled
  • Evaluation: Performed every 1000 steps
  • Logging: Every 25 steps to TensorBoard
  • Metric for Best Model: Word Error Rate (WER)

Usage

You can use this model for ASR tasks by loading it directly from the Hugging Face Model Hub:

from transformers import WhisperProcessor, WhisperForConditionalGeneration

processor = WhisperProcessor.from_pretrained("your-username/whisper-small-personal")
model = WhisperForConditionalGeneration.from_pretrained("your-username/whisper-small-personal")

# Example inference
inputs = processor("path_to_audio_file.wav", return_tensors="pt").input_features
generated_ids = model.generate(inputs)
transcription = processor.batch_decode(generated_ids, skip_special_tokens=True)

Training Procedure

The model was trained using the following setup:

  • Training Batch Size: 16
  • Gradient Accumulation: 1
  • Evaluation Batch Size: 8
  • Max Generation Length: 225 tokens
  • Learning Rate: 1e-5
  • Mixed Precision: FP16
  • Optimizer: AdamW

The best model was selected based on the Word Error Rate (WER), and the final model was pushed to the Hugging Face Model Hub.

Results

Step Training Loss Validation Loss WER
500 0.089500 0.229938 0.101203
1000 0.005200 0.215078 0.087757
1500 0.000400 0.222333 0.080502
2000 0.000200 0.226987 0.079441
Downloads last month
62
Safetensors
Model size
242M params
Tensor type
F32
·
Inference Examples
Unable to determine this model's library. Check the docs .

Model tree for luluw/whisper-small-personal

Finetuned
this model