Voice Transcriber Documentation¶
đ¤ Voice Transcriber¶
Lightweight desktop voice-to-text transcription with OpenAI Whisper and system tray integration
Overview¶
Voice Transcriber is a lightweight desktop application that provides seamless voice-to-text conversion with system tray integration. Record audio with a single click, and transcribed text is automatically copied to your clipboard.
-
System Tray Integration
Click to record, visual state feedback (green=idle, red=recording, purple=processing)
-
Multilingual Support
French, English, Spanish, German, Italian with strong language enforcement
-
AI-Powered
OpenAI Whisper transcription with optional GPT text formatting
-
Self-Hosted Option
Run 100% offline with Speaches - zero cost, complete privacy
Key Features¶
- đ¯ System Tray Integration: Click to record, visual state feedback
- đī¸ High-Quality Recording: Audio capture using arecord on Linux
- đ Multilingual Support: French, English, Spanish, German, Italian
- âī¸ Text Formatting: Optional GPT-based grammar improvement
- đ Clipboard Integration: Automatic result copying
- đ Self-Hosted Option: Run 100% offline with Speaches
- đ Privacy-Focused: No persistent audio storage, local processing
Quick Start¶
Get started in under 5 minutes:
# Clone the repository
git clone https://github.com/Nouuu/voice-transcriber.git
cd voice-transcriber
# Check system dependencies (Bun, arecord, xsel)
make check-system-deps
# Install Bun dependencies
make install
# Initialize configuration file
make init-config
# Configure your OpenAI API key
nano ~/.config/voice-transcriber/config.json
# Run the application
make run
Next Steps
- Installation Guide - Detailed setup instructions
- Configuration - Configure languages and backends
- Basic Usage - Learn how to use the app
How It Works¶
sequenceDiagram
participant User
participant SystemTray
participant AudioRecorder
participant MP3Encoder
participant Backend as Whisper API<br/>(OpenAI or Speaches)
participant GPT as ChatGPT<br/>(optional)
participant Clipboard
User->>SystemTray: Click tray icon
SystemTray->>SystemTray: State: RECORDING (đ´)
SystemTray->>AudioRecorder: Start recording
AudioRecorder->>AudioRecorder: Capture audio (arecord)
User->>SystemTray: Click again to stop
SystemTray->>SystemTray: State: PROCESSING (đŖ)
AudioRecorder->>AudioRecorder: Save WAV file
AudioRecorder->>MP3Encoder: Convert to MP3
Note over MP3Encoder: Compress audio<br/>~75% size reduction<br/>(mono 16kHz 64kbps)
MP3Encoder-->>AudioRecorder: MP3 file
AudioRecorder->>Backend: Upload MP3
Backend->>Backend: Transcribe audio
Backend-->>AudioRecorder: Return text
opt Formatting Enabled
AudioRecorder->>GPT: Format text
GPT-->>AudioRecorder: Formatted text
end
AudioRecorder->>Clipboard: Copy text
Clipboard-->>User: Paste transcription
SystemTray->>SystemTray: State: IDLE (đĸ)
Key Steps:
- Audio Capture - Records in WAV format (CD quality: 44.1kHz, 16-bit)
- MP3 Compression - Converts to mono 16kHz 64kbps MP3 (~75% size reduction)
- Transcription - Sends to OpenAI Whisper or self-hosted Speaches
- Optional Formatting - Improves grammar/punctuation with ChatGPT (if enabled)
- Clipboard - Automatically copies result for instant pasting
System Tray Menu
Right-click the tray icon to access:
- đī¸ Start Recording - Begin voice capture
- âšī¸ Stop Recording - End recording and transcribe
- â Exit - Exit the application
Menu items are automatically enabled/disabled based on current state.
Popular Use Cases¶
-
đ Note Taking
Record meetings, lectures, or brainstorming sessions with automatic transcription
-
đŦ Message Dictation
Quickly dictate messages, emails, or social media posts
-
đ Language Learning
Practice pronunciation and see transcriptions in multiple languages
-
âŋ Accessibility
Voice-to-text for users with typing difficulties
Documentation Structure¶
-
Installation, configuration, and first-run setup
-
Basic usage, language support, and troubleshooting
-
Architecture, development guide, and API reference
-
Self-hosted setup, whisper models, and local inference
Community and Support¶
- GitHub Repository: nouuu/voice-transcriber
- npm Package: voice-transcriber
- Issues: Report a bug or request a feature
- Discussions: GitHub Discussions
License¶
This project is licensed under the MIT License.
Built with â¤ī¸ using Bun, TypeScript, and OpenAI