Speaches Integration Guide¶

Complete guide for setting up self-hosted transcription with Speaches.

Why Speaches?¶

💰 Zero Cost

No API fees - unlimited transcriptions for free
🔒 Complete Privacy

100% offline - audio never leaves your machine
⚡ Same Speed

Base model performs identically to OpenAI (3.7s vs 3.8s)
🎯 High Accuracy

91-100% text similarity with OpenAI depending on model

Quick Setup¶

Step 1: Create Docker Compose File¶

Create docker-compose.speaches.yml:

services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cpu
    ports:
      - "8000:8000"
    volumes:
      - ./hf-cache:/home/ubuntu/.cache/huggingface/hub
    environment:
      - STT_MODEL_TTL=-1  # Keep model in memory
      - WHISPER__INFERENCE_DEVICE=cpu
      - WHISPER__COMPUTE_TYPE=int8
      - WHISPER__CPU_THREADS=8
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/health"]

Step 2: Start Speaches¶

docker compose -f docker-compose.speaches.yml up -d

First startup: Downloads model (~140MB for base) - takes 1-2 minutes

Step 3: Configure Voice Transcriber¶

Edit config:

nano ~/.config/voice-transcriber/config.json

Update:

{
  "language": "fr",
  "formatterEnabled": false,
  "transcription": {
    "backend": "speaches",
    "speaches": {
      "url": "http://localhost:8000/v1",
      "apiKey": "none",
      "model": "Systran/faster-whisper-base"
    }
  }
}

Step 4: Restart Application¶

# Restart Voice Transcriber
voice-transcriber

✅ Done! First transcription will auto-download the model.

Available Models¶

Model	Size	Memory	Speed	Accuracy	Use Case
tiny	75 MB	~273 MB	⚡⚡⚡	⭐⭐	Quick testing
base ⭐	142 MB	~388 MB	⚡⚡	⭐⭐⭐	Recommended
small	466 MB	~852 MB	⚡	⭐⭐⭐⭐	Better accuracy
medium	1.5 GB	~2.1 GB	🐢	⭐⭐⭐⭐⭐	High accuracy
large-v3	2.9 GB	~3.9 GB	🐢🐢	⭐⭐⭐⭐⭐	Maximum accuracy

Recommendation: Base Model

Comparable speed to OpenAI (0.97x)
91% accuracy - excellent for daily use
Low resource usage (~400MB RAM)
Zero cost

Performance Comparison¶

Benchmark: 30s French audio, Remote server (8 CPU / 8GB RAM)

Real-World Performance Benchmark:

Model	OpenAI Whisper	Speaches (CPU)	Speed Ratio	Text Similarity
tiny	1.98s	2.81s	0.70x (comparable)	92.4%
base ⭐	3.70s	3.81s	0.97x (comparable)	91.4%
small	2.23s	7.15s	0.31x (3x slower)	97.4%
medium	3.70s	25.82s	0.14x (7x slower)	96.1%
large-v3	2.55s	30.80s	0.08x (12x slower)	100.0%

Key Insights:

Base model: Nearly identical speed to OpenAI (0.97x), 91% accuracy - best for daily use
Small model: Excellent 97% accuracy, acceptable 3x slowdown
Medium/Large: Maximum quality (96-100%) but significantly slower (7-12x)

Recommendations:

For speed & cost: Use base model - nearly identical speed to OpenAI, 91% accuracy, zero cost
For accuracy: Use small model - excellent 97% accuracy, acceptable 3x slower
For maximum quality: Use medium or large-v3 - 96-100% accuracy but significantly slower (7-12x)

Performance Context

Performance tested on remote server (8 CPU cores, 8GB RAM). GPU acceleration would significantly improve medium/large model speeds (5-10x faster). Tiny and base models are CPU-optimized and run efficiently without GPU.

Changing Models¶

Edit config to use different model:

{
  "transcription": {
    "backend": "speaches",
    "speaches": {
      "model": "Systran/faster-whisper-small"
    }
  }
}

Available models: - Systran/faster-whisper-tiny - Systran/faster-whisper-base ⭐ - Systran/faster-whisper-small - Systran/faster-whisper-medium - Systran/faster-whisper-large-v3

Restart application for changes to take effect.

GPU Acceleration¶

For significantly faster processing with medium/large models:

services:
  speaches:
    image: ghcr.io/speaches-ai/speaches:latest-cuda  # GPU image
    runtime: nvidia
    environment:
      - WHISPER__INFERENCE_DEVICE=cuda
      - WHISPER__COMPUTE_TYPE=float16

Requirements: NVIDIA GPU with CUDA support

Troubleshooting¶

Model Download Fails¶

# Check logs
docker compose -f docker-compose.speaches.yml logs -f speaches

Solution: Ensure internet connection for initial download

Service Not Responding¶

# Check health
curl http://localhost:8000/health

# Restart service
docker compose -f docker-compose.speaches.yml restart

Out of Memory¶

Solution: Use smaller model or increase Docker memory limit

services:
  speaches:
    deploy:
      resources:
        limits:
          memory: 4G  # Increase memory

Advanced Configuration¶

Custom Whisper Parameters¶

environment:
  - WHISPER__BEAM_SIZE=5
  - WHISPER__BEST_OF=5
  - WHISPER__TEMPERATURE=0.0

Remote Speaches Server¶

Run Speaches on a VPS and connect remotely:

{
  "transcription": {
    "backend": "speaches",
    "speaches": {
      "url": "https://your-server.com/v1",
      "apiKey": "your-api-key",
      "model": "Systran/faster-whisper-base"
    }
  }
}

Cost Comparison¶

OpenAI Whisper: - $0.006 per minute - 100 hours = $36/month - No local resources needed

Speaches (Self-Hosted): - $0 transcription cost - VPS: ~$5-10/month (optional) - Requires local/VPS resources

Break-Even Point

After ~100 hours of transcription, Speaches becomes more cost-effective than OpenAI.

Next Steps¶

Whisper Models Comparison - Detailed model benchmarks
Transcription Backends - Backend comparison
Configuration Guide - Advanced settings

Need Help?