Speaches Integration Guide¶
Complete guide for setting up self-hosted transcription with Speaches.
Why Speaches?¶
-
💰 Zero Cost
No API fees - unlimited transcriptions for free
-
🔒 Complete Privacy
100% offline - audio never leaves your machine
-
⚡ Same Speed
Base model performs identically to OpenAI (3.7s vs 3.8s)
-
🎯 High Accuracy
91-100% text similarity with OpenAI depending on model
Quick Setup¶
Step 1: Create Docker Compose File¶
Create docker-compose.speaches.yml
:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cpu
ports:
- "8000:8000"
volumes:
- ./hf-cache:/home/ubuntu/.cache/huggingface/hub
environment:
- STT_MODEL_TTL=-1 # Keep model in memory
- WHISPER__INFERENCE_DEVICE=cpu
- WHISPER__COMPUTE_TYPE=int8
- WHISPER__CPU_THREADS=8
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8000/health"]
Step 2: Start Speaches¶
First startup: Downloads model (~140MB for base) - takes 1-2 minutes
Step 3: Configure Voice Transcriber¶
Edit config:
Update:
{
"language": "fr",
"formatterEnabled": false,
"transcription": {
"backend": "speaches",
"speaches": {
"url": "http://localhost:8000/v1",
"apiKey": "none",
"model": "Systran/faster-whisper-base"
}
}
}
Step 4: Restart Application¶
✅ Done! First transcription will auto-download the model.
Available Models¶
Model | Size | Memory | Speed | Accuracy | Use Case |
---|---|---|---|---|---|
tiny | 75 MB | ~273 MB | ⚡⚡⚡ | ⭐⭐ | Quick testing |
base ⭐ | 142 MB | ~388 MB | ⚡⚡ | ⭐⭐⭐ | Recommended |
small | 466 MB | ~852 MB | ⚡ | ⭐⭐⭐⭐ | Better accuracy |
medium | 1.5 GB | ~2.1 GB | 🐢 | ⭐⭐⭐⭐⭐ | High accuracy |
large-v3 | 2.9 GB | ~3.9 GB | 🐢🐢 | ⭐⭐⭐⭐⭐ | Maximum accuracy |
Recommendation: Base Model
- Comparable speed to OpenAI (0.97x)
- 91% accuracy - excellent for daily use
- Low resource usage (~400MB RAM)
- Zero cost
Performance Comparison¶
Benchmark: 30s French audio, Remote server (8 CPU / 8GB RAM)
Real-World Performance Benchmark:
Model | OpenAI Whisper | Speaches (CPU) | Speed Ratio | Text Similarity |
---|---|---|---|---|
tiny | 1.98s | 2.81s | 0.70x (comparable) | 92.4% |
base ⭐ | 3.70s | 3.81s | 0.97x (comparable) | 91.4% |
small | 2.23s | 7.15s | 0.31x (3x slower) | 97.4% |
medium | 3.70s | 25.82s | 0.14x (7x slower) | 96.1% |
large-v3 | 2.55s | 30.80s | 0.08x (12x slower) | 100.0% |
Key Insights:
- Base model: Nearly identical speed to OpenAI (0.97x), 91% accuracy - best for daily use
- Small model: Excellent 97% accuracy, acceptable 3x slowdown
- Medium/Large: Maximum quality (96-100%) but significantly slower (7-12x)
Recommendations:
- For speed & cost: Use
base
model - nearly identical speed to OpenAI, 91% accuracy, zero cost - For accuracy: Use
small
model - excellent 97% accuracy, acceptable 3x slower - For maximum quality: Use
medium
orlarge-v3
- 96-100% accuracy but significantly slower (7-12x)
Performance Context
Performance tested on remote server (8 CPU cores, 8GB RAM). GPU acceleration would significantly improve medium/large model speeds (5-10x faster). Tiny and base models are CPU-optimized and run efficiently without GPU.
Changing Models¶
Edit config to use different model:
{
"transcription": {
"backend": "speaches",
"speaches": {
"model": "Systran/faster-whisper-small"
}
}
}
Available models: - Systran/faster-whisper-tiny
- Systran/faster-whisper-base
⭐ - Systran/faster-whisper-small
- Systran/faster-whisper-medium
- Systran/faster-whisper-large-v3
Restart application for changes to take effect.
GPU Acceleration¶
For significantly faster processing with medium/large models:
services:
speaches:
image: ghcr.io/speaches-ai/speaches:latest-cuda # GPU image
runtime: nvidia
environment:
- WHISPER__INFERENCE_DEVICE=cuda
- WHISPER__COMPUTE_TYPE=float16
Requirements: NVIDIA GPU with CUDA support
Troubleshooting¶
Model Download Fails¶
Solution: Ensure internet connection for initial download
Service Not Responding¶
# Check health
curl http://localhost:8000/health
# Restart service
docker compose -f docker-compose.speaches.yml restart
Out of Memory¶
Solution: Use smaller model or increase Docker memory limit
Advanced Configuration¶
Custom Whisper Parameters¶
Remote Speaches Server¶
Run Speaches on a VPS and connect remotely:
{
"transcription": {
"backend": "speaches",
"speaches": {
"url": "https://your-server.com/v1",
"apiKey": "your-api-key",
"model": "Systran/faster-whisper-base"
}
}
}
Cost Comparison¶
OpenAI Whisper: - $0.006 per minute - 100 hours = $36/month - No local resources needed
Speaches (Self-Hosted): - \(0 transcription cost - VPS: ~\)5-10/month (optional) - Requires local/VPS resources
Break-Even Point
After ~100 hours of transcription, Speaches becomes more cost-effective than OpenAI.
Next Steps¶
- Whisper Models Comparison - Detailed model benchmarks
- Transcription Backends - Backend comparison
- Configuration Guide - Advanced settings
Need Help?