Voice Transcriber - API Reference¶

Overview¶

This document provides detailed API reference for all services and interfaces in the Voice Transcriber application. All services follow consistent patterns with simple interfaces and standardized error handling.

Common Patterns¶

Result Interface¶

All service methods return a consistent result interface:

interface ServiceResult {
  success: boolean;
  error?: string;
  // Additional data fields specific to the operation
}

Error Handling¶

Services never throw exceptions in normal operation
All errors are returned via the result interface
Error messages are user-friendly and actionable
Logging is handled internally by each service

Core Application¶

VoiceTranscriberApp¶

Main application class that orchestrates all services.

class VoiceTranscriberApp {
  constructor(configPath?: string)

  // Lifecycle Methods
  initialize(): Promise<{ success: boolean; error?: string }>
  shutdown(): Promise<void>

  // Private Event Handlers (called by system tray)
  private handleRecordingStart(): Promise<void>
  private handleRecordingStop(): Promise<void>
  private handleQuit(): Promise<void>
  private processAudioFile(filePath: string): Promise<void>
}

Methods¶

initialize() - Loads configuration with setup wizard if needed - Initializes all services with proper dependency injection - Sets up system tray with event callbacks - Returns initialization result

shutdown() - Stops any active recording - Cleanly shuts down system tray - Performs cleanup operations

Configuration Service¶

Config¶

Manages application configuration with user-friendly setup.

interface ConfigData {
  openaiApiKey: string;
  formatterEnabled: boolean;
}

class Config {
  openaiApiKey: string;
  formatterEnabled: boolean;

  constructor(configPath?: string)

  // Configuration Methods
  load(): Promise<void>
  loadWithSetup(): Promise<void>
  save(): Promise<void>

  // Private Methods
  private setupWizard(): Promise<void>
  private promptForApiKey(): Promise<string>
  private getUserConfigPath(): string
  private getUserConfigDir(): string
}

Methods¶

load() - Loads configuration from JSON file - Uses defaults if file doesn't exist or is invalid - Silent failure with default values

loadWithSetup() - Loads configuration - Runs setup wizard for first-time users - Creates config directory and file as needed

save() - Saves current configuration to JSON file - Creates config directory if needed - Overwrites existing configuration

setupWizard() - Interactive first-run setup - Prompts for OpenAI API key - Creates initial configuration file

Configuration Paths¶

Default: ~/.config/voice-transcriber/config.json
Custom: Provided via constructor parameter

System Tray Service¶

SystemTrayService¶

Manages system tray integration with visual state feedback.

enum TrayState {
  IDLE = "idle",
  RECORDING = "recording",
  PROCESSING = "processing"
}

interface TrayConfig {
  callbacks: {
    onRecordingStart: () => void;
    onRecordingStop: () => void;
    onQuit: () => void;
  };
}

interface TrayResult {
  success: boolean;
  error?: string;
}

class SystemTrayService {
  constructor(config: TrayConfig, systrayConstructor?: typeof SysTray)

  // Public Methods
  initialize(): Promise<TrayResult>
  setState(state: TrayState): Promise<TrayResult>
  shutdown(): Promise<TrayResult>

  // Private Methods
  private getIconBase64(state: TrayState): string
  private getTooltip(state: TrayState): string
}

Methods¶

initialize() - Creates system tray with menu items - Sets up click event handlers - Waits for tray to be ready - Returns initialization result

setState(state: TrayState) - Updates tray icon based on application state - Modifies menu item availability - Updates tooltip text - Handles icon recreation for state changes

shutdown() - Cleanly destroys system tray - Releases system resources

States and Icons¶

IDLE: Green circle - Ready to record
RECORDING: Red circle - Actively recording
PROCESSING: Purple circle - Transcribing audio

🎤 Start Recording: Enabled when IDLE
⏹️ Stop Recording: Enabled when RECORDING
❌ Exit: Always enabled

Audio Recording Service¶

AudioRecorder¶

Handles system audio capture using Linux arecord.

interface AudioRecorderConfig {
  tempDir?: string;
}

interface RecordingResult {
  success: boolean;
  filePath?: string;
  error?: string;
}

class AudioRecorder {
  constructor(config?: AudioRecorderConfig)

  // Public Methods
  startRecording(): Promise<RecordingResult>
  stopRecording(): Promise<RecordingResult>
  isRecording(): boolean
}

Methods¶

startRecording() - Creates temporary directory if needed - Generates timestamped filename - Spawns arecord process with CD quality settings - Returns recording result with file path

stopRecording() - Sends SIGTERM to arecord process - Cleans up process references - Returns result with final file path

isRecording() - Returns true if recording process is active - Used for state validation

Audio Format¶

Format: WAV (CD quality)
Sample Rate: 44.1kHz
Bit Depth: 16-bit
Channels: Stereo
Device: ALSA default input

File Management¶

Location: /tmp/transcriber/recording-{timestamp}.wav
Naming: ISO timestamp with safe characters
Cleanup: Manual cleanup required (handled by main app)

Transcription Service¶

TranscriptionService¶

Converts audio files to text using OpenAI Whisper API.

interface TranscriptionConfig {
  apiKey: string;
  language?: string;
  prompt?: string;
}

interface TranscriptionResult {
  success: boolean;
  text?: string;
  error?: string;
}

class TranscriptionService {
  constructor(config: TranscriptionConfig)

  // Public Methods
  transcribe(filePath: string): Promise<TranscriptionResult>
}

Methods¶

transcribe(filePath: string) - Validates audio file exists - Creates read stream for file upload - Calls OpenAI Whisper API with optimized settings - Returns transcribed text or error

Configuration¶

Default Settings:

{
  language: undefined,  // Auto-detect French/English
  prompt: "Please transcribe this audio exactly as spoken, preserving the original language. The speaker may mix French and English in the same sentence. Keep technical terms in their original language (English), but preserve French sentence structure and grammar. Do not translate between languages."
}

API Parameters: - Model: whisper-1 (OpenAI's production model) - Language: Auto-detect if undefined - Prompt: Enhanced for French/English mixed speech

Multilingual Support¶

Auto-Detection: Automatic language identification
Mixed Speech: Preserves French/English code-switching
Technical Terms: Keeps English technical vocabulary
Grammar: Maintains original language sentence structure

Text Formatting Service¶

FormatterService¶

Optional text enhancement using ChatGPT API.

interface FormatterConfig {
  apiKey: string;
  enabled: boolean;
  language: string;
  prompt?: string;
}

interface FormatResult {
  success: boolean;
  text?: string;
  error?: string;
}

class FormatterService {
  constructor(config: FormatterConfig)

  // Public Methods
  formatText(text: string, language: string): Promise<FormatResult>
}

Methods¶

formatText(text: string, language: string) - Returns original text if formatting disabled - Validates input text is not empty - Calls ChatGPT API for text enhancement - Uses language-specific prompt to preserve original language - Returns formatted text or error

Configuration¶

Default Settings:

{
  prompt: null,  // Uses auto-generated language-aware prompt
  enabled: true,
  language: "en" // Supported: en, fr, es, de, it
}

Auto-generated prompts (when prompt is null): - Maintains the specified language - Prevents translation to other languages - Preserves original meaning and tone

API Parameters: - Model: gpt-3.5-turbo (fast and cost-effective) - Temperature: 0.3 (consistent, low-creativity output) - Max Tokens: 1000 (sufficient for typical transcriptions)

Text Enhancement¶

Grammar: Corrects grammatical errors
Punctuation: Adds proper punctuation
Language Preservation: Maintains original language (French/English/Spanish/German/Italian)
Structure: Improves text structure and flow

Clipboard Service¶

ClipboardService¶

Cross-platform clipboard operations.

interface ClipboardResult {
  success: boolean;
  error?: string;
}

class ClipboardService {
  // Public Methods
  writeText(text: string): Promise<ClipboardResult>
}

Methods¶

writeText(text: string) - Validates input text is not empty - Writes text to system clipboard - Returns operation result

Platform Support¶

Linux: Uses clipboardy with xsel/xclip backend
Windows: Native Windows clipboard API
macOS: Native macOS clipboard API

Logging Service¶

Logger¶

Simple console-based logging utility.

interface Logger {
  info(message: string): void
  error(message: string): void
}

const logger: Logger

Methods¶

info(message: string) - Logs informational messages to console - Includes timestamp and formatted output

error(message: string) - Logs error messages to console - Includes timestamp and error formatting

Log Levels¶

INFO: General application flow and status
ERROR: Errors and exceptions

Output Format¶

[TIMESTAMP] [LEVEL] MESSAGE

Error Codes and Messages¶

Common Error Patterns¶

Configuration Errors: - "OpenAI API key not configured" - "Config file could not be loaded"

Audio Recording Errors: - "Already recording" - "Not recording" - "Failed to start recording: {details}"

Transcription Errors: - "Audio file does not exist" - "No transcription text received" - "Failed to transcribe audio: {details}"

System Tray Errors: - "System tray not initialized" - "Failed to initialize: {details}"

Clipboard Errors: - "Text cannot be empty" - "Failed to write to clipboard: {details}"

API Rate Limiting¶

The application does not implement automatic retry logic. Rate limiting is handled by: - Using conservative API call patterns - Single transcription per recording session - Optional formatting (can be disabled)

Network Error Handling¶

Network errors are returned as operation failures: - Connection timeouts - Invalid API keys - Service unavailable - Rate limit exceeded

Usage Examples¶

Basic Application Lifecycle¶

// Initialize application
const app = new VoiceTranscriberApp();
const result = await app.initialize();

if (!result.success) {
  console.error(result.error);
  process.exit(1);
}

// Application runs via system tray events
// Shutdown when needed
await app.shutdown();

Custom Configuration¶

// Load custom config path
const config = new Config('/path/to/custom/config.json');
await config.load();

// Modify settings
config.formatterEnabled = false;
await config.save();

// Use with application
const app = new VoiceTranscriberApp('/path/to/custom/config.json');

Manual Service Usage¶

// Direct transcription service usage
const transcriber = new TranscriptionService({
  apiKey: 'your-api-key'
});

const result = await transcriber.transcribe('/path/to/audio.wav');
if (result.success) {
  console.log('Transcription:', result.text);
}

Error Handling Pattern¶

// Consistent error handling across all services
const result = await service.someOperation();

if (!result.success) {
  logger.error(`Operation failed: ${result.error}`);
  // Handle error appropriately
  return;
}

// Use result.data if operation succeeded
console.log('Success:', result.data);

Voice Transcriber - API Reference¶

Overview¶

Common Patterns¶

Result Interface¶

Error Handling¶

Core Application¶

VoiceTranscriberApp¶

Methods¶

Configuration Service¶

Config¶

Methods¶

Configuration Paths¶

System Tray Service¶

SystemTrayService¶

Methods¶

States and Icons¶

Menu Items¶

Audio Recording Service¶

AudioRecorder¶

Methods¶

Audio Format¶

File Management¶

Transcription Service¶

TranscriptionService¶

Methods¶

Configuration¶

Multilingual Support¶

Text Formatting Service¶

FormatterService¶

Methods¶

Configuration¶

Text Enhancement¶

Clipboard Service¶

ClipboardService¶

Methods¶

Platform Support¶

Logging Service¶

Logger¶

Methods¶

Log Levels¶

Output Format¶

Error Codes and Messages¶

Common Error Patterns¶

API Rate Limiting¶

Network Error Handling¶

Usage Examples¶

Basic Application Lifecycle¶

Custom Configuration¶

Manual Service Usage¶

Error Handling Pattern¶