Basic Usage Guide¶
Learn how to use Voice Transcriber effectively for everyday voice-to-text transcription.
System Tray Interface¶
Voice Transcriber operates through a system tray icon with three visual states:
-
IDLE
Green circle - Ready to record
Click to start recording
-
RECORDING
Red circle - Currently recording
Speak into microphone, click to stop
-
PROCESSING
Purple circle - Transcribing audio
Wait for completion, result copied to clipboard
Recording Audio¶
Start Recording¶
Method 1: Click tray icon
- Left-click the system tray icon (green circle)
- Icon changes to red
- Start speaking into your microphone
Method 2: Context menu
- Right-click the system tray icon
- Select "Start Recording"
- Icon changes to red
- Start speaking
Microphone Position
For best results, position your microphone 15-30 cm from your mouth
Stop Recording¶
Method 1: Click tray icon again
- Left-click the system tray icon (red circle)
- Icon changes to purple (processing)
- Wait for transcription
Method 2: Context menu
- Right-click the system tray icon
- Select "Stop Recording"
- Icon changes to purple (processing)
Automatic Clipboard Copy¶
Once processing completes:
- Icon returns to green (idle)
- Transcribed text is automatically copied to your clipboard
- Paste anywhere with
Ctrl+V
(Linux) orCmd+V
(macOS)
Example Workflow¶
Context Menu Options¶
Right-click the tray icon for available actions:
đ¤ Voice Transcriber
âââ đī¸ Start Recording
âââ âšī¸ Stop Recording
âââ âī¸ Open Config
âââ đ Reload Config
âââ â Exit
Menu Actions¶
- Start Recording
- Begin audio capture (disabled while recording)
- Same as left-click when idle
- Stop Recording
- End recording and transcribe (enabled only while recording)
- Same as left-click when recording
- Open Config
- Opens configuration file in your default text editor
- Always available
- Reload Config
- Reloads configuration without restarting the application
- Only available when idle (not recording or processing)
- Exit
- Exit the application gracefully
Menu Behavior
Menu items automatically enable/disable based on state:
Menu Item | Idle (đĸ) | Recording (đ´) | Processing (đŖ) |
---|---|---|---|
Start Recording | â Enabled | â Disabled | â Disabled |
Stop Recording | â Disabled | â Enabled | â Disabled |
Open Config | â Enabled | â Enabled | â Enabled |
Reload Config | â Enabled | â Disabled | â Disabled |
Exit | â Enabled | â Enabled | â Enabled |
Configuration Management¶
You can now manage configuration directly from the system tray menu without restarting the application.
Quick Configuration Workflow¶
- Open Config: Right-click tray icon â "Open Config"
- Edit: Make your changes in the text editor
- Save: Save the configuration file
- Reload: Right-click tray icon â "Reload Config" (when idle)
When to Use Reload Config¶
Reload Config is useful when you want to:
- Switch between transcription backends (OpenAI â Speaches)
- Test different language settings
- Update API keys
- Change transcription or formatting prompts
- Enable/disable the formatter
Live Configuration Updates
Changes take effect immediately after reload - no need to restart the application!
Reload Restrictions
Reload Config is disabled during:
- Recording (đ´): Would interrupt audio capture
- Processing (đŖ): Would interfere with transcription
Wait for the icon to turn green (idle) before reloading configuration.
Configuration Reload Safety¶
The reload process includes automatic safety checks:
- Validation: Configuration is validated before applying
- Rollback: Previous configuration is restored if reload fails
- Service reinitialization: All services are properly restarted with new settings
If reload fails, you'll see an error message and the previous configuration will be restored automatically.
Debug Mode¶
Enable debug mode for detailed logging and performance metrics:
What Debug Mode Shows¶
Debug mode provides detailed information about:
- File sizes: WAV and MP3 file sizes with compression ratios
- Audio format: Sample rate, channels, conversion details
- Processing times: Breakdown of upload, processing, and response times
- Transcription details: Character count, duration metrics
Example Debug Output¶
2025-10-11T10:30:15.123Z [DEBUG] WAV file size: 2.45 MB (2569216 bytes)
2025-10-11T10:30:15.125Z [DEBUG] WAV format: 2 channel(s), 44100 Hz sample rate
2025-10-11T10:30:15.234Z [DEBUG] MP3 file size: 0.62 MB (650240 bytes)
2025-10-11T10:30:15.234Z [DEBUG] Compression ratio: 74.7% size reduction
2025-10-11T10:30:15.234Z [DEBUG] WAV to MP3 conversion completed in 0.11 seconds
2025-10-11T10:30:16.789Z [INFO] OpenAI transcription completed in 1.55s
2025-10-11T10:30:16.789Z [DEBUG] ââ Estimated breakdown: upload ~0.47s, processing ~0.93s, receive ~0.16s
2025-10-11T10:30:16.789Z [DEBUG] ââ Transcription length: 142 characters
When to Use Debug Mode¶
Debug Mode Use Cases
- Troubleshooting: Identify where delays occur in the transcription pipeline
- Performance analysis: Understand audio compression effectiveness
- Quality verification: Check audio format and processing details
- Benchmarking: Compare different backends or configurations
Disabling Debug Mode¶
Simply run without the --debug
flag:
Best Practices¶
For Accuracy¶
Improve Transcription Quality
- Speak clearly at a moderate pace
- Minimize background noise when possible
- Use a quality microphone for best results
- Pause between sentences for better formatting
For Efficiency¶
Maximize Productivity
- Use keyboard shortcuts (if your desktop environment supports global hotkeys)
- Record in chunks for long dictations (easier to review)
- Review transcriptions before using (especially for technical content)
- Customize language settings for consistent results
For Multilingual Use¶
Language Consistency
- Set primary language in config file
- Restart after language changes for prompts to take effect
- Avoid language mixing for best accuracy
- See Language Support for details
Common Use Cases¶
Note Taking¶
Perfect for: Meeting notes, lecture notes, brainstorming sessions
Tips: - Record entire meeting or lecture - Transcribe in segments for easier review - Edit transcription afterward for clarity
Message Dictation¶
Perfect for: Emails, chat messages, social media posts
Tips: - Speak naturally but with clear punctuation - Enable formatter for proper capitalization - Review before sending
Content Creation¶
Perfect for: Blog posts, articles, scripts
Tips: - Outline first, then dictate sections - Use short recordings for easier editing - Transcribe ideas quickly without typing
Accessibility¶
Perfect for: Users with typing difficulties, RSI, or mobility issues
Tips: - Configure comfortable microphone position - Use voice commands in conjunction with transcription - Combine with accessibility tools in your OS
Performance Expectations¶
Transcription Speed¶
Recording Length | Processing Time | Notes |
---|---|---|
5-10 seconds | 1-2 seconds | Near-instant |
30 seconds | 2-4 seconds | Very fast |
1 minute | 3-6 seconds | Fast |
5 minutes | 10-20 seconds | Moderate |
10+ minutes | 20-40 seconds | Longer wait |
Processing Time Factors
- OpenAI API response time
- Audio file size and compression
- Internet connection speed
- GPT formatting (adds 1-2 seconds if enabled)
Accuracy Expectations¶
Content Type | Expected Accuracy | Notes |
---|---|---|
Clear speech | 95-98% | Excellent |
Technical terms | 85-95% | Good, may need review |
Accented speech | 80-95% | Varies by accent |
Noisy environment | 70-85% | Reduced accuracy |
Mixed languages | 75-90% | See language support |
Keyboard Shortcuts (Future Feature)¶
Coming Soon
Global keyboard shortcuts are planned for a future release. Currently, Wayland security restrictions prevent global hotkey registration. This feature will be available when Wayland adds proper hotkey support.
Next Steps¶
- Language Support - Multilingual transcription
- Transcription Backends - OpenAI vs Speaches
- Troubleshooting - Common issues and solutions
Need Help?