Voice Communication
High-quality audio with noise suppression and echo cancellation
Real-time voice communication with noise suppression, echo cancellation, and voice activity detection.
Key Features
High-Quality Audio
- Crystal-Clear Voice: Professional-grade audio quality with minimal latency
- Noise Suppression: Advanced algorithms filter out background noise
- Echo Cancellation: Eliminates echo and feedback automatically
- Automatic Gain Control: Maintains consistent volume levels
Voice Controls
- Mute/Unmute: Server-synchronized microphone control with visual feedback
- Deafen: Mutes incoming audio; state is synced to the server
- Push-to-Talk: Hold a configurable key to transmit (global hotkey in desktop app, keyboard in browser)
- Voice Activity: Automatic transmission based on voice level (default mode)
Real-time Feedback
- Voice Activity Detection: Visual indicators showing who's speaking
- Audio Level Meters: Real-time visualization of audio input/output
- Connection Status: Clear indicators for connection states
- Speaking Indicators: Visual feedback for all participants
Voice Controls
Mute / Deafen
Toggling mute or deafen updates the local audio pipeline immediately and emits a voice:state:update event to the signaling server:
socket.emit("voice:state:update", { isMuted, isDeafened, isAFK });Other participants see the updated state in the member list within milliseconds.
Push-to-Talk
Push-to-talk is available in both the browser (keyboard key) and the desktop app (global hotkey via uiohook-napi). The PTT key is configurable in Settings > Audio. When PTT is active, the mic is unmuted only while the key is held. Per-channel PTT can also be enforced by admins via the channel setting requirePushToTalk.
Server Restart Resilience
Gryt automatically handles signaling server restarts without losing your voice session.
How It Works
The voice architecture separates the signaling server (session management, member lists) from the SFU (audio forwarding). When the signaling server restarts:
- Audio continues uninterrupted — the SFU operates independently, so voice chat keeps working during the restart
- Automatic reconnection — once the signaling server is back online, clients automatically reconnect and restore their session
- Voice channel rejoin — the client performs a full SFU disconnect and reconnect to cleanly re-establish speaking indicators, stream mapping, and member list status
- Identity preservation — session restoration uses the access token from the initial connection, so nicknames and user identity are preserved
Timing
| Phase | Duration | What happens |
|---|---|---|
| Server down | Varies | Audio continues via SFU; signaling unavailable |
| Reconnect | ~1s | Socket.IO auto-reconnects; session restored from JWT |
| Voice rejoin | ~2.5s | Full SFU disconnect + reconnect; speaking indicators restored |
What Users See
- A brief "Reconnecting..." toast notification
- Voice temporarily disconnects and reconnects (connect sound plays)
- Member list updates to show correct voice status within seconds
Audio Quality Features
Screen Share Audio Isolation
The desktop app uses OS-native per-process audio capture to exclude Gryt's own audio from screen share streams. On Windows this uses WASAPI process loopback exclusion; on macOS it uses ScreenCaptureKit's excludesCurrentProcessAudio. Other participants hear only your game or application audio, not the voices of people already in the call. See Audio Processing — Native Capture for technical details.
Screen Share Quality and Gaming Mode
Gaming mode (on by default) optimizes screen sharing for fast-moving content like games and video. It hides the mouse cursor from the stream, tells the encoder to prioritize framerate over resolution, and applies a 1.5x bitrate boost. Toggle it in the screen share dialog before starting a share.
The Advanced panel in the screen share dialog gives you control over:
- Codec — Auto (H.264), H.264, VP9, or AV1. Auto prefers H.264 for universal hardware encoding via NVENC (NVIDIA), Quick Sync (Intel), and AMF (AMD). AV1 offers better compression but requires newer GPUs (RTX 40+, Intel Arc, AMD RX 7000+) for hardware encode.
- Max bitrate — Override the auto-estimated bitrate with a fixed value from 1 to 50 Mbps.
- SVC layers — Scalable Video Coding temporal layers. L1T3 (default) enables 3 framerate tiers (7.5/15/30fps) so the SFU can send lower framerate to bandwidth-constrained receivers instead of degrading quality for everyone. L1T2 gives 2 tiers, and L1T1 disables SVC.
For detailed tuning information, see Performance Tuning — Screen share video quality.
RNNoise Noise Reduction
AI-powered noise suppression via an AudioWorkletNode running RNNoise compiled to WebAssembly. Processes 480-sample frames off the main thread with ~20 ms added latency. Toggle in Settings → Audio → Noise Reduction (RNNoise).
Auto Gain Control
True RMS-based AGC that normalizes your voice to a configurable target level (-30 to -5 dB, default -20 dB). Quiet speech gets boosted, loud speech gets reduced. Enabled by default.
Compressor
Optional dynamic range compressor that tames peaks after AGC. Adjustable from gentle leveling (0 %) to heavy squash (100 %). Enabled by default at 50 %.
Noise Gate
Configurable threshold gate that silences the mic when you're not speaking. Uses the raw (pre-RNNoise) signal level for accurate detection.
For full pipeline details and settings reference, see Audio Processing.
Voice Activity Detection
Visual Indicators
Real-time visual feedback for all participants:
- Speaking Animation: Pulsing border around user avatars when voice is detected
- Audio Level Bars: Real-time visualization of voice levels in settings
- Connection Status: Clear indicators for connection states
- Mute Indicators: Visual feedback for mute/deafen states
Troubleshooting
Common Audio Issues
Microphone not working?
- Check browser permissions for microphone access
- Verify HTTPS in production (required for
getUserMedia) - Try selecting a different input device in Audio Settings
Echo or feedback?
- Use headphones instead of speakers
- Echo cancellation is enabled by default via browser constraints
- Reduce output volume if echo persists
Voice cutting in and out?
- Lower the noise gate threshold in Audio Settings
- If using RNNoise, it may be gating legitimate speech — try disabling it
Audio too quiet or too loud?
- Adjust the AGC target level slider
- Combine with the microphone volume slider for fine control
Audio Metrics
- Sample Rate: 48 kHz (Opus)
- Bitrate: Configurable per channel (32–510 kbps presets)
- Pipeline latency: ~20 ms with RNNoise enabled; negligible without it