Since the current model is a wav2vec2 pre-trained model for Vietnamese audio, it won't work with alignment tasks. To make it work as expected, I recommend chaining to a fine-tuned ASR version.
Since the current model is a wav2vec2 pre-trained model for Vietnamese audio, it won't work with alignment tasks. To make it work as expected, I recommend chaining to a fine-tuned ASR version.