Add OpenAI-compatible API and Docker deployment

- Add FastAPI-based API in whisperx/api/ - Implement transcription endpoint compatible with OpenAI - Added Dockerfile and docker-compose.yml for easy deployment - Updated README with Docker instructions - Added new script whisperx-serve for running the API
2026-05-13 01:37:47 +03:00
parent d154f4b39b
commit c1fcb3f57c
8 changed files with 238 additions and 1 deletions
--- a/README.md
+++ b/README.md
@@ -111,6 +111,46 @@ uv sync --all-extras --dev

 You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.

+## Docker Deployment 🐳
+
+For easy deployment with GPU support, use Docker Compose:
+
+### Prerequisites
+- Docker and Docker Compose installed
+- ROCm compatible GPU (AMD) or NVIDIA GPU with CUDA
+- For AMD ROCm, ensure ROCm drivers are installed on host
+
+### Steps
+
+1. Clone the repository:
+```bash
+git clone https://github.com/m-bain/whisperX.git
+cd whisperX
+```
+
+2. Build and run the container:
+```bash
+docker-compose up --build
+```
+
+The API will be available at `http://localhost:8000`
+
+### Environment Variables
+- `WHISPERX_MODEL`: Model size (default: large-v2)
+- `WHISPERX_DEVICE`: cuda or cpu (default: cuda)
+- `WHISPERX_COMPUTE_TYPE`: float16 or float32 (default: float16)
+
+### API Usage
+The API is compatible with OpenAI's transcription endpoint:
+
+```bash
+curl -X POST http://localhost:8000/v1/audio/transcriptions \
+  -H "Content-Type: multipart/form-data" \
+  -F "file=@audio.wav" \
+  -F "model=whisper-1" \
+  -F "language=en"
+```
+
 ### Speaker Diarization

 To **enable Speaker Diarization**, include your Hugging Face access token (read) that you can generate from [Here](https://huggingface.co/settings/tokens) after the `--hf_token` argument and accept the user agreement for the following models: [Segmentation](https://huggingface.co/pyannote/segmentation-3.0) and [Speaker-Diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) (if you choose to use Speaker-Diarization 2.x, follow requirements [here](https://huggingface.co/pyannote/speaker-diarization) instead.)