Gryt

Voice Communication

High-quality audio with noise suppression and echo cancellation

Real-time voice communication with noise suppression, echo cancellation, and voice activity detection.

Key Features

High-Quality Audio

  • Crystal-Clear Voice: Professional-grade audio quality with minimal latency
  • Noise Suppression: Advanced algorithms filter out background noise
  • Echo Cancellation: Eliminates echo and feedback automatically
  • Automatic Gain Control: Maintains consistent volume levels

Voice Controls

  • Mute/Unmute: Server-synchronized microphone control with visual feedback
  • Deafen: Mutes incoming audio; state is synced to the server
  • Push-to-Talk: Hold a configurable key to transmit (global hotkey in desktop app, keyboard in browser)
  • Voice Activity: Automatic transmission based on voice level (default mode)

Real-time Feedback

  • Voice Activity Detection: Visual indicators showing who's speaking
  • Audio Level Meters: Real-time visualization of audio input/output
  • Connection Status: Clear indicators for connection states
  • Speaking Indicators: Visual feedback for all participants

Voice Controls

Mute / Deafen

Toggling mute or deafen updates the local audio pipeline immediately and emits a voice:state:update event to the signaling server:

socket.emit("voice:state:update", { isMuted, isDeafened, isAFK });

Other participants see the updated state in the member list within milliseconds.

Push-to-Talk

Push-to-talk is available in both the browser (keyboard key) and the desktop app (global hotkey via uiohook-napi). The PTT key is configurable in Settings > Audio. When PTT is active, the mic is unmuted only while the key is held. Per-channel PTT can also be enforced by admins via the channel setting requirePushToTalk.

Server Restart Resilience

Gryt automatically handles signaling server restarts without losing your voice session.

How It Works

The voice architecture separates the signaling server (session management, member lists) from the SFU (audio forwarding). When the signaling server restarts:

  1. Audio continues uninterrupted — the SFU operates independently, so voice chat keeps working during the restart
  2. Automatic reconnection — once the signaling server is back online, clients automatically reconnect and restore their session
  3. Voice channel rejoin — the client performs a full SFU disconnect and reconnect to cleanly re-establish speaking indicators, stream mapping, and member list status
  4. Identity preservation — session restoration uses the access token from the initial connection, so nicknames and user identity are preserved

Timing

PhaseDurationWhat happens
Server downVariesAudio continues via SFU; signaling unavailable
Reconnect~1sSocket.IO auto-reconnects; session restored from JWT
Voice rejoin~2.5sFull SFU disconnect + reconnect; speaking indicators restored

What Users See

  • A brief "Reconnecting..." toast notification
  • Voice temporarily disconnects and reconnects (connect sound plays)
  • Member list updates to show correct voice status within seconds

Audio Quality Features

Screen Share Audio Isolation

The desktop app uses OS-native per-process audio capture to exclude Gryt's own audio from screen share streams. On Windows this uses WASAPI process loopback exclusion; on macOS it uses ScreenCaptureKit's excludesCurrentProcessAudio. Other participants hear only your game or application audio, not the voices of people already in the call. See Audio Processing — Native Capture for technical details.

Screen Share Quality and Gaming Mode

Gaming mode (on by default) optimizes screen sharing for fast-moving content like games and video. It hides the mouse cursor from the stream, tells the encoder to prioritize framerate over resolution, and applies a 1.5x bitrate boost. Toggle it in the screen share dialog before starting a share.

The Advanced panel in the screen share dialog gives you control over:

  • Codec — Auto (H.264), H.264, VP9, or AV1. Auto prefers H.264 for universal hardware encoding via NVENC (NVIDIA), Quick Sync (Intel), and AMF (AMD). AV1 offers better compression but requires newer GPUs (RTX 40+, Intel Arc, AMD RX 7000+) for hardware encode.
  • Max bitrate — Override the auto-estimated bitrate with a fixed value from 1 to 50 Mbps.
  • SVC layers — Scalable Video Coding temporal layers. L1T3 (default) enables 3 framerate tiers (7.5/15/30fps) so the SFU can send lower framerate to bandwidth-constrained receivers instead of degrading quality for everyone. L1T2 gives 2 tiers, and L1T1 disables SVC.

For detailed tuning information, see Performance Tuning — Screen share video quality.

RNNoise Noise Reduction

AI-powered noise suppression via an AudioWorkletNode running RNNoise compiled to WebAssembly. Processes 480-sample frames off the main thread with ~20 ms added latency. Toggle in Settings → Audio → Noise Reduction (RNNoise).

Auto Gain Control

True RMS-based AGC that normalizes your voice to a configurable target level (-30 to -5 dB, default -20 dB). Quiet speech gets boosted, loud speech gets reduced. Enabled by default.

Compressor

Optional dynamic range compressor that tames peaks after AGC. Adjustable from gentle leveling (0 %) to heavy squash (100 %). Enabled by default at 50 %.

Noise Gate

Configurable threshold gate that silences the mic when you're not speaking. Uses the raw (pre-RNNoise) signal level for accurate detection.

For full pipeline details and settings reference, see Audio Processing.

Voice Activity Detection

Visual Indicators

Real-time visual feedback for all participants:

  • Speaking Animation: Pulsing border around user avatars when voice is detected
  • Audio Level Bars: Real-time visualization of voice levels in settings
  • Connection Status: Clear indicators for connection states
  • Mute Indicators: Visual feedback for mute/deafen states

Troubleshooting

Common Audio Issues

Microphone not working?

  • Check browser permissions for microphone access
  • Verify HTTPS in production (required for getUserMedia)
  • Try selecting a different input device in Audio Settings

Echo or feedback?

  • Use headphones instead of speakers
  • Echo cancellation is enabled by default via browser constraints
  • Reduce output volume if echo persists

Voice cutting in and out?

  • Lower the noise gate threshold in Audio Settings
  • If using RNNoise, it may be gating legitimate speech — try disabling it

Audio too quiet or too loud?

  • Adjust the AGC target level slider
  • Combine with the microphone volume slider for fine control

Audio Metrics

  • Sample Rate: 48 kHz (Opus)
  • Bitrate: Configurable per channel (32–510 kbps presets)
  • Pipeline latency: ~20 ms with RNNoise enabled; negligible without it

On this page