- поработал с переменными среды

- добавил ограничение максимального размера аудиофайлы (по умолчанию 50мб что дофига) - поправил docker-compose.yml, теперь можно одной командой развернуться - написал большую инструкцию по деплою через docker на debian
2025-09-06 16:37:02 +03:00
parent f718da13d6
commit d8c27b1cbb
5 changed files with 254 additions and 108 deletions
--- a/.env.example
+++ b/.env.example
@@ -0,0 +1,25 @@
+# Device settings
+ASR_DEVICE=rocm
+ASR_MODELS_ROOT=/app/models
+
+# Server settings
+PORT=9854
+HOST=0.0.0.0
+LOG_LEVEL=info
+
+# Model settings
+DEFAULT_MODEL=turbo
+DEFAULT_LANGUAGE=ru
+
+# Processing settings
+AUDIO_SPEEDUP=1.25
+MAX_UPLOAD_SIZE_MB=50  # 50MB in bytes
+
+# Cache settings
+CACHE_DIR=/app/cache
+ENABLE_CACHE=true
+
+# Security settings
+CORS_ORIGINS=*
+MAX_REQUESTS_PER_MINUTE=60
+
--- a/README.MD
+++ b/README.MD
@@ -0,0 +1,113 @@
+# ASR service на базе Whisper + AMD ROCm
+
+Сервис для распознавания речи с использованием GPU AMD через ROCm.
+
+## Требования
+
+- GPU AMD с поддержкой ROCm (RX5000, RX6000 +)
+- Установленный ROCm
+- Docker + Compose plugin
+- Debian based linux (для инструкции)
+- Минимум 8GB RAM
+- Минимум 2GB свободного места + место для ROCm + Модели (turbo ~3.5GB)
+
+## Установка и запуск
+
+### 1. Клонирование репозитория
+
+```bash
+git clone https://github.com/SlavaVlad/simple-asr-server.git ./asr
+cd asr
+```
+
+### 2. Создание Docker-образа
+
+Перед запуском в файле docker-compose.yml можно изменить всякие переменные окружения, например модель по умолчанию.
+```bash
+docker compose up -d
+```
+Тут он сам всё соберёт и запустится.
+
+## Управление API-ключами
+
+При первом запуске сервиса автоматически создается файл `keys.txt` со случайно сгенерированным API-ключом. 
+
+### Добавление новых ключей
+1. Откройте файл `keys.txt`
+2. Добавьте новые ключи, каждый с новой строки
+3. Перезапустите сервис
+
+Пример содержимого `keys.txt`:
+```
+f7a9c2b3d4e5a6b7c8d9e0f1a2b3c4d5
+e8b9a2c3d4f5g6h7i8j9k0l1m2n3o4p5
+```
+
+## Использование API
+
+### Распознавание речи
+
+```bash
+curl -X POST "http://localhost:9854/transcribe" \
+     -H "x-api-key: ВАШ_КЛЮЧ" \
+     -F "file=@путь_к_аудио_файлу" \
+     -F "model_name=turbo" # (необязательно, по умолчанию turbo)
+```
+
+### Ответ API
+
+```json
+{
+    "transcription": [...],
+    "text": "распознанный текст",
+    "metrics": {
+        "processing_time_seconds": 1.23,
+        "characters_per_second": 45.6,
+        "audio_realtime_ratio": 2.34,
+        "audio_duration": 5.67,
+        "text_length": 89
+    }
+}
+```
+
+```text
+
+```
+
+## Метрики производительности
+
+API возвращает следующие метрики:
+- `processing_time_seconds`: время обработки в секундах
+- `characters_per_second`: скорость обработки (символов в секунду)
+- `audio_realtime_ratio`: отношение длительности аудио к времени обработки
+- `audio_duration`: длительность аудио в секундах
+- `text_length`: количество символов в распознанном тексте
+
+## Поддерживаемые форматы
+
+Поддерживаются все аудиоформаты, которые может обработать FFmpeg. Файлы автоматически конвертируются в нужный формат.
+
+## Решение проблем
+
+### Проверка статуса сервиса
+```bash
+curl http://localhost:9854/docs
+```
+
+### Частые проблемы
+
+1. **Ошибка доступа к GPU**
+   ```bash
+   sudo usermod -a -G video,render $USER
+   sudo reboot
+   ```
+
+2. **Ошибка 403**
+   - Проверьте правильность API-ключа
+   - Убедитесь, что ключ добавлен в файл keys.txt
+   - Перезапустите сервис
+
+## Лицензия
+
+MIT License
+
--- a/README.md
+++ b/README.md
@@ -1,104 +0,0 @@
-BASED ON https://github.com/salute-developers/GigaAM
-
-# Simple ASR Server
-
-This project provides a RESTful API for audio transcription using a Whisper model. The API is built with FastAPI and runs in a Docker container.
-
-## Prerequisites
-
-Before you begin, ensure you have the following installed:
-
-*   [Docker](https://docs.docker.com/get-docker/)
-*   [Docker Compose](https://docs.docker.com/compose/install/)
-
-## Project Structure
-
-```
-.
-├── app.py              # Main application file with FastAPI endpoint
-├── docker-compose.yml  # Docker Compose configuration
-├── Dockerfile          # Dockerfile for building the application image
-├── model/              # Directory for Whisper model files
-└── requirements.txt    # Python dependencies
-```
-
-## Setup
-
-1.  **Clone the repository:**
-
-    ```bash
-    git clone https://github.com/SlavaVlad/simple-asr-server
-    cd simple-asr-server
-    ```
-3.  **Add API keys:**
-
-    Create a `keys.txt` file in the root of the project and add your API keys, one per line.
-
-## Building and Running the Project
-
-You can build and run the project using Docker Compose.
-
-1.  **Build the Docker image:**
-
-    ```bash
-    docker-compose build
-    ```
-
-2.  **Run the container:**
-
-    ```bash
-    docker-compose up
-    ```
-
-    The application will be available at `http://0.0.0.0:9854`.
-
-## API Endpoint
-
-### POST /transcribe
-
-This endpoint accepts an audio file and returns the transcription.
-
-*   **URL:** `/transcribe`
-*   **Method:** `POST`
-*   **Headers:**
-    *   `X-API-Key`: Your API key.
-*   **Form Data:**
-    *   `file`: The audio file to be transcribed.
-
-**Example using `curl`:**
-
-```bash
-curl -X POST "http://localhost:9854/transcribe" \
-     -H "X-API-Key: YOUR_API_KEY" \
-     -F "file=@/path/to/your/audio.wav"
-```
-
-**Successful Response (200 OK):**
-
-```json
-{
-  "transcription": [
-    {
-      "start_time": 0.0,
-      "end_time": 2.5,
-      "transcription": "Hello world."
-    }
-  ],
-  "text": "Hello world. ",
-  "metrics": {
-    "processing_time": 5.2,
-    "rtf": 0.5,
-    "word_rate": 2.0
-  }
-}
-```
-
-**Error Response (401 Unauthorized):**
-
-If the API key is missing or invalid.
-
-```json
-{
-  "detail": "Invalid API Key"
-}
-```
--- a/app.py
+++ b/app.py
@@ -105,6 +105,74 @@ def get_audio_duration(file_path: str) -> float:
        return 0.0


+@app.post("/transcribe/simple")
+async def transcribe_simple(
+        file: UploadFile = File(...),
+        token: str = Depends(api_key_header),
+        model_name: str = "turbo"
+):
+    # Token validation
+    if token not in get_keys():
+        logger.warning(f"Invalid token attempt: {token}")
+        if token == "" or token is None:
+            raise HTTPException(status_code=401, detail="Forbidden. x-api-key header is missing or empty.")
+        raise HTTPException(status_code=403, detail="Forbidden. Invalid API key.")
+
+    logger.info(f"Processing file: {file.filename} with model: {model_name}")
+
+    if file.size > int(os.getenv("MAX_UPLOAD_SIZE_MB")) * 1024 * 1024:
+        raise HTTPException(status_code=400, detail=f'File size exceeds ${os.getenv("MAX_UPLOAD_SIZE_MB")}MB limit')
+
+    # Save uploaded file
+    temp_input_path = f"/tmp/input_{file.filename}"
+    temp_output_path = f"/tmp/converted_{file.filename}.wav"
+
+    try:
+        with open(temp_input_path, "wb") as f:
+            f.write(await file.read())
+
+        # Convert audio if needed
+        logger.debug("Converting audio file")
+        if not convert_audio(temp_input_path, temp_output_path):
+            raise HTTPException(status_code=400, detail="Audio conversion failed")
+
+        # Get audio duration before speed up
+        original_duration = get_audio_duration(temp_input_path)
+
+        # Transcribe
+        logger.info("Starting transcription")
+        if original_duration > 30:
+            logger.info("Audio duration > 30 seconds, using transcribe_longform")
+            transcription_result = model.transcribe_longform(
+                temp_output_path
+            )
+        else:
+            logger.info("Audio duration <= 30 seconds, using transcribe")
+            transcription_result = model.transcribe(
+                temp_output_path
+            )
+
+        full_text = ""
+        for part in transcription_result:
+            if part["transcription"].strip() != "":
+                full_text += part["transcription"].strip() + " "
+
+        result = full_text
+
+        return result
+
+    except Exception as e:
+        logger.error(f"Transcription failed: {str(e)}")
+        raise HTTPException(status_code=500, detail=str(e))
+
+    finally:
+        # Cleanup temporary files
+        if os.path.exists(temp_input_path):
+            os.remove(temp_input_path)
+        if os.path.exists(temp_output_path):
+            os.remove(temp_output_path)
+
+
@app.post("/transcribe")
 async def transcribe_audio(
        file: UploadFile = File(...),
@@ -117,6 +185,10 @@ async def transcribe_audio(
        raise HTTPException(status_code=403, detail="Forbidden")

    logger.info(f"Processing file: {file.filename} with model: {model_name}")
+
+    if file.size > int(os.getenv("MAX_UPLOAD_SIZE_MB")) * 1024 * 1024:
+        raise HTTPException(status_code=400, detail=f'File size exceeds ${os.getenv("MAX_UPLOAD_SIZE_MB")}MB limit')
+
    metrics = TranscriptionMetrics()

    # Save uploaded file
@@ -139,6 +211,24 @@ async def transcribe_audio(
        logger.info("Starting transcription")
        if original_duration > 30:
            logger.info("Audio duration > 30 seconds, using transcribe_longform")
+            cmd = [
+                'ffmpeg', '-i', temp_input_path,
+                '-filter:a', f'atempo={os.getenv("AUDIO_SPEEDUP", 1.25)}',
+                '-ar', '16000',
+                '-ac', '1',
+                '-c:a', 'pcm_s16le',
+                temp_output_path,
+                '-y'
+            ]
+            log = subprocess.run(cmd, check=True, capture_output=True)
+            logger.debug(f"Running FFmpeg command: {' '.join(cmd)}")
+            logger.info("Audio sped up for longform transcription")
+            if log.stderr:
+                logger.error(f"FFmpeg err log: {log.stderr.decode()}")
+                logger.debug(f"FFmpeg log: {log.stdout.decode()}")
+            else:
+                logger.debug(f"FFmpeg log: {log.stdout.decode()}")
+
            transcription_result = model.transcribe_longform(
                temp_output_path
            )
@@ -182,7 +272,8 @@ async def transcribe_audio(
 def main():
    import uvicorn
    get_keys()
-    uvicorn.run(app, host="0.0.0.0", port=9854, log_level="debug")
+    uvicorn.run(app, host="0.0.0.0", port=9854, log_level=os.getenv("LOG_LEVEL", "info"))
+

 if __name__ == "__main__":
    main()
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -1,6 +1,27 @@
 services:
-  whisper-app:
+  asr-server:
    build: .
    ports:
-      - "9854:9854"
-    command: ["python", "app.py"]
+      - "${PORT:-9854}:9854"
+    environment:
+      - HOST=${HOST:-0.0.0.0}
+      - PORT=${PORT:-9854}
+      - DEFAULT_MODEL=${DEFAULT_MODEL:-turbo}
+      - MODEL_DOWNLOAD_ROOT=${MODEL_DOWNLOAD_ROOT:-/app/models}
+      - KEYS_FILE=${KEYS_FILE:-/app/data/keys.txt}
+      - HSA_OVERRIDE_GFX_VERSION=${HSA_OVERRIDE_GFX_VERSION:-10.3.0}
+      - LOG_LEVEL=${LOG_LEVEL:-info}
+      - AUDIO_SPEEDUP=${AUDIO_SPEEDUP:-1.25}
+    volumes:
+      - ./models:/app/models
+      - ./data:/app/data
+    devices:
+      - /dev/kfd:/dev/kfd
+      - /dev/dri:/dev/dri
+    restart: unless-stopped
+    healthcheck:
+      test: ["CMD", "curl", "-f", "http://localhost:9854/health"]
+      interval: 30s
+      timeout: 10s
+      retries: 3
+      start_period: 60s