Files
whisperx-rocm-api/ROCM.md
2025-12-27 04:16:29 +00:00

160 lines
4.4 KiB
Markdown

# WhisperX ROCm Fork
This is a fork of [m-bain/whisperX](https://github.com/m-bain/whisperX) configured for AMD ROCm GPUs.
## Tested Configuration
- **OS**: Debian Sid (trixie/forky)
- **GPU**: AMD Radeon RX 7700 XT (gfx1101, 12GB VRAM)
- **ROCm**: 7.1.1
- **Python**: 3.10
- **PyTorch**: 2.11.0+rocm7.0 (nightly)
- **CTranslate2**: 4.6.2 (ROCm build from [paralin/ctranslate2-rocm](https://github.com/paralin/ctranslate2-rocm))
## Prerequisites
1. ROCm 7.1+ installed at `/opt/rocm`
2. CTranslate2 built with ROCm support (see [paralin/ctranslate2-rocm](https://github.com/paralin/ctranslate2-rocm))
## Installation (Debian Sid)
### 1. Clone and setup
```bash
git clone https://github.com/paralin/whisperX-rocm.git ~/whisperx
cd ~/whisperx
git checkout rocm
```
### 2. Create venv with uv
```bash
uv venv
uv pip install -e .
```
### 3. Install ROCm PyTorch
```bash
uv pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/rocm7.0
```
### 4. Install ROCm CTranslate2
First build CTranslate2 with ROCm support (see [paralin/ctranslate2-rocm](https://github.com/paralin/ctranslate2-rocm)), then:
```bash
# Remove PyPI ctranslate2 (has CUDA binaries)
rm -rf .venv/lib/python3.10/site-packages/ctranslate2*
# Install ROCm build
export CTRANSLATE2_ROOT=/usr/local
export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
uv pip install --reinstall pybind11 ~/ctranslate2/python
```
## Environment Variables
These must be set before running whisperx:
```bash
export HSA_OVERRIDE_GFX_VERSION=11.0.1 # for gfx1101
export AMDGPU_TARGETS=gfx1101
export ROCM_PATH=/opt/rocm
export HIP_VISIBLE_DEVICES=0
export ROCR_VISIBLE_DEVICES=0
export LD_LIBRARY_PATH=/usr/local/lib:/opt/rocm/lib:/opt/rocm/lib/llvm/lib:$LD_LIBRARY_PATH
```
## Usage
```bash
# Set environment (add to ~/.bashrc for convenience)
export HSA_OVERRIDE_GFX_VERSION=11.0.1
export ROCM_PATH=/opt/rocm
export HIP_VISIBLE_DEVICES=0
export LD_LIBRARY_PATH=/usr/local/lib:/opt/rocm/lib:/opt/rocm/lib/llvm/lib:$LD_LIBRARY_PATH
cd ~/whisperx
uv run whisperx audio.wav \
--language en \
--model large-v3 \
--compute_type float16 \
--device cuda \
--batch_size 8 \
--vad_method silero \
--output_dir ./output \
--output_format all
```
Note: We use `--device cuda` because ROCm's HIP layer translates CUDA API calls to AMD GPU.
## Verify Installation
```bash
# Check PyTorch sees the GPU
python -c "import torch; print(CUDA:, torch.cuda.is_available()); print(Device:, torch.cuda.get_device_name(0))"
# Check CTranslate2
python -c "import ctranslate2; print(ctranslate2.__version__); print(ctranslate2.get_supported_compute_types(cuda))"
```
Expected output:
```
CUDA: True
Device: AMD Radeon RX 7700 XT
4.6.2
{int8_float16, int8_bfloat16, bfloat16, int8_float32, int8, float16, float32}
```
## GPU Architecture
Set `HSA_OVERRIDE_GFX_VERSION` based on your GPU:
| GPU | Architecture | HSA_OVERRIDE_GFX_VERSION |
|-----|--------------|--------------------------|
| RX 7900 XTX/XT | gfx1100 | 11.0.0 |
| RX 7800 XT | gfx1101 | 11.0.1 |
| RX 7700 XT | gfx1101 | 11.0.1 |
| RX 7600 | gfx1102 | 11.0.2 |
| RX 6900/6800/6700 | gfx1030 | 10.3.0 |
| RX 6600 | gfx1032 | 10.3.2 |
## Upstream
- Original: [m-bain/whisperX](https://github.com/m-bain/whisperX)
## Known Issues
### Memory Access Fault on Long Audio
When transcribing longer audio files (>60s), you may encounter:
```
Memory access fault by GPU node-1 (Agent handle: 0x...) on address 0x...
Reason: Page not present or supervisor privilege.
```
**Status**: Under investigation. Short clips (~60s) work fine at ~28x realtime with small model.
**Workaround**: Process audio in chunks, or use CPU mode for long files.
**Working example** (first 60s):
```python
from faster_whisper import WhisperModel
model = WhisperModel("small", device="cuda", compute_type="float16")
segments, info = model.transcribe("audio.wav", language="en", clip_timestamps=[0, 60])
```
**Search terms for updates**:
- `"Memory access fault by GPU node" "Page not present or supervisor privilege" ROCm 7.1 PyTorch site:github.com`
- `"Memory access fault" ROCm CTranslate2 faster-whisper gfx1101`
This may be related to:
- ROCm 7.1.1 + PyTorch nightly (2.11.0+rocm7.0) incompatibility
- GPU memory fragmentation with longer sequences
- HIP/ROCm memory management issues with certain operations
**Related issue**: https://github.com/ROCm/ROCm/issues/5616 (gfx1103, same error pattern)