XLS-R-300m-FTSpeech

Model description

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the FTSpeech dataset, being a dataset of 1,800 hours of transcribed speeches from the Danish parliament.

Performance

The model achieves the following WER scores (lower is better):

Dataset	WER without LM	WER with 5-gram LM
Danish part of Common Voice 8.0	20.48	17.91
Alvenir test set	15.46	13.84

License

The use of this model needs to adhere to this license from the Danish Parliament.

Downloads last month: 29,512

Safetensors

Model size

315M params

Tensor type

F32

Inference Examples

Automatic Speech Recognition

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for saattrupdan/wav2vec2-xls-r-300m-ftspeech

Base model

facebook/wav2vec2-xls-r-300m

Finetuned

(371)

this model

Evaluation results

wer on Danish Common Voice 8.0
self-reported

17.910
wer on Alvenir ASR test dataset
self-reported

13.840

View on Papers With Code