Skip to content

Voice Transcriber Documentation

🎤 Voice Transcriber

Lightweight desktop voice-to-text transcription with OpenAI Whisper and system tray integration

Build Test License Bun


Overview

Voice Transcriber is a lightweight desktop application that provides seamless voice-to-text conversion with system tray integration. Record audio with a single click, and transcribed text is automatically copied to your clipboard.

  • System Tray Integration


    Click to record, visual state feedback (green=idle, red=recording, purple=processing)

  • Multilingual Support


    French, English, Spanish, German, Italian with strong language enforcement

  • AI-Powered


    OpenAI Whisper transcription with optional GPT text formatting

  • Self-Hosted Option


    Run 100% offline with Speaches - zero cost, complete privacy

Key Features

  • đŸŽ¯ System Tray Integration: Click to record, visual state feedback
  • đŸŽ™ī¸ High-Quality Recording: Audio capture using arecord on Linux
  • 🌍 Multilingual Support: French, English, Spanish, German, Italian
  • âœī¸ Text Formatting: Optional GPT-based grammar improvement
  • 📋 Clipboard Integration: Automatic result copying
  • 🏠 Self-Hosted Option: Run 100% offline with Speaches
  • 🔒 Privacy-Focused: No persistent audio storage, local processing

Quick Start

Get started in under 5 minutes:

# Clone the repository
git clone https://github.com/Nouuu/voice-transcriber.git
cd voice-transcriber

# One-command setup (checks deps, installs, creates config)
make setup

# Configure your OpenAI API key
nano ~/.config/voice-transcriber/config.json

# Run the application
make run
# Clone the repository
git clone https://github.com/Nouuu/voice-transcriber.git
cd voice-transcriber

# Check system dependencies (Bun, arecord, xsel)
make check-system-deps

# Install Bun dependencies
make install

# Initialize configuration file
make init-config

# Configure your OpenAI API key
nano ~/.config/voice-transcriber/config.json

# Run the application
make run

Next Steps

How It Works

sequenceDiagram
    participant User
    participant SystemTray
    participant AudioRecorder
    participant MP3Encoder
    participant Backend as Whisper API<br/>(OpenAI or Speaches)
    participant GPT as ChatGPT<br/>(optional)
    participant Clipboard

    User->>SystemTray: Click tray icon
    SystemTray->>SystemTray: State: RECORDING (🔴)
    SystemTray->>AudioRecorder: Start recording
    AudioRecorder->>AudioRecorder: Capture audio (arecord)
    User->>SystemTray: Click again to stop
    SystemTray->>SystemTray: State: PROCESSING (đŸŸŖ)
    AudioRecorder->>AudioRecorder: Save WAV file
    AudioRecorder->>MP3Encoder: Convert to MP3
    Note over MP3Encoder: Compress audio<br/>~75% size reduction<br/>(mono 16kHz 64kbps)
    MP3Encoder-->>AudioRecorder: MP3 file
    AudioRecorder->>Backend: Upload MP3
    Backend->>Backend: Transcribe audio
    Backend-->>AudioRecorder: Return text
    opt Formatting Enabled
        AudioRecorder->>GPT: Format text
        GPT-->>AudioRecorder: Formatted text
    end
    AudioRecorder->>Clipboard: Copy text
    Clipboard-->>User: Paste transcription
    SystemTray->>SystemTray: State: IDLE (đŸŸĸ)

Key Steps:

  1. Audio Capture - Records in WAV format (CD quality: 44.1kHz, 16-bit)
  2. MP3 Compression - Converts to mono 16kHz 64kbps MP3 (~75% size reduction)
  3. Transcription - Sends to OpenAI Whisper or self-hosted Speaches
  4. Optional Formatting - Improves grammar/punctuation with ChatGPT (if enabled)
  5. Clipboard - Automatically copies result for instant pasting

System Tray Menu

Right-click the tray icon to access:

  • đŸŽ™ī¸ Start Recording - Begin voice capture
  • âšī¸ Stop Recording - End recording and transcribe
  • ❌ Exit - Exit the application

Menu items are automatically enabled/disabled based on current state.

  • 📝 Note Taking

    Record meetings, lectures, or brainstorming sessions with automatic transcription

  • đŸ’Ŧ Message Dictation

    Quickly dictate messages, emails, or social media posts

  • 🌐 Language Learning

    Practice pronunciation and see transcriptions in multiple languages

  • â™ŋ Accessibility

    Voice-to-text for users with typing difficulties

Documentation Structure

  • Getting Started

    Installation, configuration, and first-run setup

  • User Guide

    Basic usage, language support, and troubleshooting

  • Development

    Architecture, development guide, and API reference

  • Advanced

    Self-hosted setup, whisper models, and local inference

Community and Support

License

This project is licensed under the MIT License.


Built with â¤ī¸ using Bun, TypeScript, and OpenAI