Add OpenAI-compatible API and Docker deployment

- Add FastAPI-based API in whisperx/api/
- Implement transcription endpoint compatible with OpenAI
- Added Dockerfile and docker-compose.yml for easy deployment
- Updated README with Docker instructions
- Added new script whisperx-serve for running the API
This commit is contained in:
2026-05-13 01:37:47 +03:00
parent d154f4b39b
commit c1fcb3f57c
8 changed files with 238 additions and 1 deletions

View File

@@ -111,6 +111,46 @@ uv sync --all-extras --dev
You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.
## Docker Deployment 🐳
For easy deployment with GPU support, use Docker Compose:
### Prerequisites
- Docker and Docker Compose installed
- ROCm compatible GPU (AMD) or NVIDIA GPU with CUDA
- For AMD ROCm, ensure ROCm drivers are installed on host
### Steps
1. Clone the repository:
```bash
git clone https://github.com/m-bain/whisperX.git
cd whisperX
```
2. Build and run the container:
```bash
docker-compose up --build
```
The API will be available at `http://localhost:8000`
### Environment Variables
- `WHISPERX_MODEL`: Model size (default: large-v2)
- `WHISPERX_DEVICE`: cuda or cpu (default: cuda)
- `WHISPERX_COMPUTE_TYPE`: float16 or float32 (default: float16)
### API Usage
The API is compatible with OpenAI's transcription endpoint:
```bash
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-H "Content-Type: multipart/form-data" \
-F "file=@audio.wav" \
-F "model=whisper-1" \
-F "language=en"
```
### Speaker Diarization
To **enable Speaker Diarization**, include your Hugging Face access token (read) that you can generate from [Here](https://huggingface.co/settings/tokens) after the `--hf_token` argument and accept the user agreement for the following models: [Segmentation](https://huggingface.co/pyannote/segmentation-3.0) and [Speaker-Diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) (if you choose to use Speaker-Diarization 2.x, follow requirements [here](https://huggingface.co/pyannote/speaker-diarization) instead.)