94 Commits

Author SHA1 Message Date
Barabazs
c8f7597345 feat: add hotwords argument to CLI for improved recognition of rare terms 2025-10-17 09:21:56 -06:00
Barabazs
a51ae7a81a feat: add centralized logging to replace ad-hoc print statements (#1254)
* feat: add logging utility functions

* feat: add logging setup and log level argument to CLI

* feat: integrate logging across modules
2025-10-10 08:41:06 +02:00
Barabazs
3b1b9a8c4d refactor: rename types.py to schema.py to avoid stdlib conflict 2025-10-09 14:25:58 +02:00
3manifold
64e307cc29 chore: remove redundant variable & improve load_model function documentation (#1197)
* Remove redundant variable

* Improve function documentation
2025-10-09 09:32:02 +02:00
Barabazs
ffedc5cdf0 fix: speaker embedding bug (#1178)
* fix: improve handling of speaker embeddings in transcribe_task

* chore: bump version to 3.4.1
2025-06-25 13:55:20 +02:00
Radu-Sebastian Amarie
1631c3040f feat: enhance diarization with optional output of speaker embeddings
- Updated DiarizationPipeline to include a return_embeddings parameter for optional speaker embeddings.
- Modified assign_word_speakers to accept and process speaker embeddings.
- Updated CLI to support --speaker_embeddings flag for JSON output.
- Ensured backward compatibility for existing functionality.
2025-06-24 15:01:09 +02:00
bog
b343241253 feat: add diarize_model arg to CLI (#1101) 2025-05-31 13:32:31 +02:00
Barabazs
7d36b832f9 refactor: update CLI entry point 2025-05-03 09:25:59 +02:00
Barabazs
ac0c8bd79a feat: add version and Python version arguments to CLI 2025-05-01 11:08:54 +02:00
Barabazs
e7712f496e refactor: update import statements to use explicit module paths across multiple files 2025-03-25 16:24:21 +01:00
philmcmahon
7b3c9ce629 Add models_cache_only param 2025-01-27 12:16:37 +00:00
3manifold
79eb8fa53d Accept alternative VAD methods. Extend to use Silero VAD. 2025-01-06 13:41:46 +01:00
Barabazs
9a8967f27e refactor: add type hints 2025-01-05 11:48:24 +01:00
Abhishek Sharma
51da22771f feat: add verbose output (#759)
---------

Co-authored-by: Abhishek Sharma <abhishek@zipteams.com>
Co-authored-by: Barabazs <31799121+Barabazs@users.noreply.github.com>
2025-01-01 13:07:52 +01:00
canoalberto
942c336b8f Fixes --model_dir path 2023-12-27 14:03:54 -05:00
Mahmoud Ashraf
f865dfe710 fix typo 2023-12-04 17:38:50 +03:00
amosal
afd5ef1d58 FIX warnings for word options 2023-10-31 18:55:35 +01:00
Max Bain
c6fe379d9e Merge pull request #517 from jkukul/support-language-names-as-parameters
Support language names in `--language` parameter.
2023-10-25 11:16:30 -07:00
Jakub Kukul
14a7cab8eb Pass patience and beam_size to faster-whisper. 2023-10-14 13:51:29 +02:00
Jakub Kukul
1001a055db Support language names in --language. 2023-10-10 13:55:47 +02:00
Max Bain
ffd6167b26 Merge pull request #473 from sorgfresser/fix-faster-whisper-threads 2023-09-19 16:53:34 -07:00
Simon Sorg
0ae0d49d1d add faster whisper threading 2023-09-14 11:47:51 +02:00
陳鈞
5223de2a41 fix: UnboundLocalError: local variable 'align_language' referenced before assignment 2023-08-30 01:11:09 +08:00
陳鈞
f505702dc7 chore(writer): Join words without spaces for ja, zh
fix #248, fix #310
2023-08-30 01:11:09 +08:00
Max Bain
9647f60fca Merge branch 'main' into add-merge-chunk-size-as-argument 2023-08-29 10:05:05 -06:00
陳鈞
eb771cf56d feat: Add merge chunks chunk_size as arguments.
Suggest from https://github.com/m-bain/whisperX/issues/200#issuecomment-1666507780
2023-08-29 23:09:02 +08:00
awerks
cb3ed4ab9d Update transcribe.py 2023-08-16 16:22:29 +02:00
Mark Berger
48e7caad77 Update transcribe.py -> small change in batch_size description
Changed the description of the `batch_size` parameter.
2023-07-24 11:45:38 +02:00
Max Bain
d39c1b2319 add "aud" to output_format 2023-06-07 11:48:49 +01:00
Max Bain
b026407fd9 Merge branch 'v3' of https://github.com/m-bain/whisperX into v3
Conflicts:
	whisperx/asr.py
2023-06-05 15:30:02 +01:00
Max Bain
a323cff654 --suppress_numerals option, ensures non-numerical words, for wav2vec2 alignment 2023-06-05 15:27:42 +01:00
Simon
74b98ebfaa ensure device_index not None 2023-05-20 13:11:30 +02:00
Simon
53396adb21 add device_index 2023-05-20 13:02:46 +02:00
Max Bain
fd8f1003cf add translate, fix word_timestamp error 2023-05-13 12:14:06 +01:00
Max Bain
4603f010a5 update readme, setup, add option to return char_timestamps 2023-05-07 20:28:33 +01:00
Max Bain
24008aa1ed fix long segments, break into sentences using nltk, improve align logic, improve diarize (sentence-based) 2023-05-07 15:32:58 +01:00
Max Bain
07361ba1d7 add device to dia pipeline @sorgfresser 2023-05-05 11:53:51 +01:00
Max Bain
4e2ac4e4e9 torch2.0, remove compile for now, round to times to 3 decimal 2023-05-04 20:38:13 +01:00
Simon
d8f0ef4a19 Set diarization device manually 2023-05-04 16:25:34 +02:00
Prashanth Ellina
601c91140f references #202, attempt to fix speaker diarization failing in v3 2023-04-30 17:33:24 +00:00
Max Bain
0efad26066 pass compute_type 2023-04-24 21:26:44 +01:00
Max Bain
2a29f0ec6a add compute types 2023-04-24 21:24:22 +01:00
Max Bain
558d980535 v3 init 2023-04-24 21:08:43 +01:00
invisprints
bb15c9428f opti the inference loop 2023-04-09 15:58:55 +08:00
dev-nomi
4146e56d5b Added vad_filter type 2023-04-05 17:11:29 +05:00
Max Bain
11a78d7ced handle tmp wav file better 2023-04-01 00:06:40 +01:00
Max Bain
b9ca701d69 .wav conversion, handle audio with no detected speech 2023-03-31 23:02:38 +01:00
Max Bain
d0fa028045 fix tfile naming 2023-03-30 19:24:42 +01:00
Max Bain
ae4a9de307 add vad model external dl 2023-03-30 18:57:55 +01:00
Max Bain
18b63d46e2 skeleton v2 2023-03-30 05:31:57 +01:00