Add OpenAI-compatible API and Docker deployment
- Add FastAPI-based API in whisperx/api/ - Implement transcription endpoint compatible with OpenAI - Added Dockerfile and docker-compose.yml for easy deployment - Updated README with Docker instructions - Added new script whisperx-serve for running the API
This commit is contained in:
40
README.md
40
README.md
@@ -111,6 +111,46 @@ uv sync --all-extras --dev
|
||||
|
||||
You may also need to install ffmpeg, rust etc. Follow openAI instructions here https://github.com/openai/whisper#setup.
|
||||
|
||||
## Docker Deployment 🐳
|
||||
|
||||
For easy deployment with GPU support, use Docker Compose:
|
||||
|
||||
### Prerequisites
|
||||
- Docker and Docker Compose installed
|
||||
- ROCm compatible GPU (AMD) or NVIDIA GPU with CUDA
|
||||
- For AMD ROCm, ensure ROCm drivers are installed on host
|
||||
|
||||
### Steps
|
||||
|
||||
1. Clone the repository:
|
||||
```bash
|
||||
git clone https://github.com/m-bain/whisperX.git
|
||||
cd whisperX
|
||||
```
|
||||
|
||||
2. Build and run the container:
|
||||
```bash
|
||||
docker-compose up --build
|
||||
```
|
||||
|
||||
The API will be available at `http://localhost:8000`
|
||||
|
||||
### Environment Variables
|
||||
- `WHISPERX_MODEL`: Model size (default: large-v2)
|
||||
- `WHISPERX_DEVICE`: cuda or cpu (default: cuda)
|
||||
- `WHISPERX_COMPUTE_TYPE`: float16 or float32 (default: float16)
|
||||
|
||||
### API Usage
|
||||
The API is compatible with OpenAI's transcription endpoint:
|
||||
|
||||
```bash
|
||||
curl -X POST http://localhost:8000/v1/audio/transcriptions \
|
||||
-H "Content-Type: multipart/form-data" \
|
||||
-F "file=@audio.wav" \
|
||||
-F "model=whisper-1" \
|
||||
-F "language=en"
|
||||
```
|
||||
|
||||
### Speaker Diarization
|
||||
|
||||
To **enable Speaker Diarization**, include your Hugging Face access token (read) that you can generate from [Here](https://huggingface.co/settings/tokens) after the `--hf_token` argument and accept the user agreement for the following models: [Segmentation](https://huggingface.co/pyannote/segmentation-3.0) and [Speaker-Diarization-3.1](https://huggingface.co/pyannote/speaker-diarization-3.1) (if you choose to use Speaker-Diarization 2.x, follow requirements [here](https://huggingface.co/pyannote/speaker-diarization) instead.)
|
||||
|
||||
Reference in New Issue
Block a user