whisperx-rocm-api

Files

Nguyen Binh b1c8ac7de6 Change alignment model for Vietnamese language

Since the current model is a wav2vec2 pre-trained model for Vietnamese audio, it won't work with alignment tasks. To make it work as expected, I recommend chaining to a fine-tuned ASR version.

2025-10-03 09:41:03 +02:00

assets

move model to assets

2024-12-14 22:53:53 -06:00

vads

Remove unused code in Vad class

2025-07-01 09:06:04 +02:00

__init__.py

fix: remove DiarizationPipeline from public API

2025-05-03 09:25:59 +02:00

__main__.py

feat: enhance diarization with optional output of speaker embeddings

2025-06-24 15:01:09 +02:00

alignment.py

Change alignment model for Vietnamese language

2025-10-03 09:41:03 +02:00

asr.py

fix(asr): load VAD model on correct CUDA device (#835 )

2025-07-02 08:07:59 +02:00

audio.py

refactor: update import statements to use explicit module paths across multiple files

2025-03-25 16:24:21 +01:00

conjunctions.py

refactor: add type hints

2025-01-05 11:48:24 +01:00

diarize.py

style: minor code formatting

2025-06-24 15:01:09 +02:00

SubtitlesProcessor.py

refactor: update import statements to use explicit module paths across multiple files

2025-03-25 16:24:21 +01:00

transcribe.py

fix: speaker embedding bug (#1178 )

2025-06-25 13:55:20 +02:00

types.py

feat: add SegmentData type for temporary processing during alignment

2025-01-13 10:45:50 +01:00

utils.py

Remove duplicated item

2025-04-12 11:08:15 +02:00