Tutorial10 min read

Building Real-time Voice
Applications with LiveKit

From telehealth to online education to social experiences — real-time voice is everywhere. Here's how LiveKit makes it accessible, scalable, and production-ready.

What is LiveKit?

LiveKit is an open-source, end-to-end WebRTC platform. It abstracts away the notoriously complex WebRTC stack and gives you clean APIs for building real-time audio and video applications. Think of it as the "Stripe of real-time communication" — it handles the hard infrastructure so you can focus on your product.

<100ms
Latency
Sub-second real-time
1 to 1M+
Scale
Calls to broadcasts
6+
Platforms
Web, iOS, Android...
Apache 2.0
License
Fully open source

Why Not Just Use WebRTC Directly?

You could build on raw WebRTC. But you'd spend months dealing with STUN/TURN servers, codec negotiation, bandwidth estimation, connection state machines, and the dozens of browser-specific quirks that make WebRTC notoriously hard to ship.

Real talk: I've seen teams spend 6+ months building a "simple" voice chat on raw WebRTC, only to abandon it for a managed solution. The edge cases in production — NAT traversal failures, codec mismatches, mobile backgrounding — are brutal.

LiveKit gives you production-ready infrastructure out of the box:

SFU Architecture

Selective Forwarding Unit for efficient multi-party calls. Each participant sends their stream once; the server distributes it to everyone else.

Adaptive Bitrate

Automatically adjusts audio and video quality based on each participant's network conditions. No manual tuning needed.

Edge Deployment

Globally distributed servers so participants connect to the nearest node, minimizing latency regardless of geography.

Built-in Recording

Server-side recording without client-side overhead. Record individual tracks or composite rooms for playback or compliance.

Voice-Specific Superpowers

LiveKit isn't just a generic WebRTC wrapper. It has first-class support for voice applications with features that would take months to build yourself.

Noise Cancellation

Krisp.ai integration filters out background noise — keyboards, dogs, construction — leaving crystal-clear voice.

Echo Cancellation

Built-in acoustic echo cancellation prevents that awful feedback loop when someone isn't wearing headphones.

Voice Activity Detection

Intelligently detects when someone is speaking vs. silent. Essential for UI indicators and bandwidth optimization.

Spatial Audio

3D audio positioning for immersive experiences. Place participants in virtual space so conversations feel natural.

Building a Voice Chat Room

Let's walk through the key pieces. A voice application has two sides: a backend that generates secure tokens, and a frontend that connects to the room and handles audio.

  Client App
      │
      ▼
  LiveKit SDK  ──→  Token Server (your backend)
      │
      ▼
  LiveKit SFU  ──→  Routes audio streams
      │
      ▼
  Other Clients

Step 1: Token Generation (Backend)

Every participant needs a JWT token to join a room. This is generated server-side so your API keys are never exposed to clients.

token-server.tsTypeScript
import { AccessToken } from 'livekit-server-sdk';

function createToken(roomName: string, participantName: string) {
  const token = new AccessToken(apiKey, apiSecret, {
    identity: participantName,
  });

  token.addGrant({
    room: roomName,
    roomJoin: true,
    canPublish: true,     // Can send audio
    canSubscribe: true,   // Can receive audio
  });

  return token.toJwt();
}

Step 2: Room Connection (Frontend)

On the client side, create a Room instance, configure audio settings, and connect. LiveKit handles all the WebRTC negotiation, ICE candidate gathering, and codec selection behind the scenes.

voice-room.tsTypeScript
import { Room, RoomEvent } from 'livekit-client';

const room = new Room({
  adaptiveStream: true,
  dynacast: true,
  audioCaptureDefaults: {
    autoGainControl: true,
    echoCancellation: true,
    noiseSuppression: true,
  },
});

// Someone joins
room.on(RoomEvent.ParticipantConnected, (participant) => {
  console.log(`${participant.identity} joined the room`);
});

// Receive their audio
room.on(RoomEvent.TrackSubscribed, (track, pub, participant) => {
  if (track.kind === 'audio') {
    const audioElement = track.attach();
    document.body.appendChild(audioElement);
  }
});

// Connect and enable mic
await room.connect(livekitUrl, token);
await room.localParticipant.setMicrophoneEnabled(true);
That's it. About 30 lines of code to build a working multi-party voice chat. Compare that to the thousands of lines you'd need with raw WebRTC, and you can see why LiveKit has become the go-to choice.

Production Best Practices

Getting a demo working is one thing. Shipping to production is another. Here are the things that matter when real users are on the line.

Handle Network Gracefully

Implement reconnection logic with exponential backoff. Show connection quality indicators so users know if the issue is on their end. Gracefully degrade audio quality rather than dropping the connection entirely.

Optimize Audio Pipeline

Use Opus codec (LiveKit's default) for the best quality-to-bandwidth ratio. Configure audio constraints properly. Enable browser-level audio processing for echo cancellation and noise suppression.

Get the UX Right

Show visual indicators when someone is speaking. Add keyboard shortcuts for mute/unmute. Display clear connection status. These small details make the difference between a demo and a product people actually want to use.

Lock Down Security

Never expose API keys client-side. Set short token expiration times. Validate all permissions server-side. Use room-level access controls to prevent unauthorized joins.

Beyond Basic Voice Chat

Once you have the fundamentals, LiveKit opens the door to some genuinely exciting use cases.

01

AI Voice Assistants

Pipe audio through Whisper for real-time transcription, process with an LLM, and respond with text-to-speech. LiveKit's low latency makes conversational AI feel natural rather than like talking to a voicemail system.

02

Podcast Recording Platform

Multi-track server-side recording gives each participant their own audio file for post-production. Stream live to an audience while recording. No client-side CPU overhead means guests on older hardware still sound great.

03

Voice-Enabled Gaming

Proximity-based chat where you hear players near you in the game world. Team channels for coordinated play. Spatial audio that makes the game world feel alive. In-game voice commands powered by speech recognition.

Performance Numbers That Matter

When building voice applications, these are the benchmarks you should be targeting. Miss any of them and your users will notice.

<300ms
End-to-end latency
Above this, conversation feels unnatural
32-64kb/s
Per audio stream
Opus codec sweet spot
<5%
CPU overhead
Monitor client-side processing
99.9%
Connection uptime
With proper reconnection logic

Deployment Options

LiveKit gives you flexibility in how you deploy, each with different tradeoffs.

Self-hosted

Full control over your infrastructure. Deploy on your own servers or cloud instances. Best for compliance-heavy industries or teams with strong DevOps capabilities.

LiveKit Cloud

Managed service with pay-as-you-go pricing. Global edge network, automatic scaling, and zero infrastructure management. Best for most teams shipping quickly.

Hybrid

Self-host your primary infrastructure with LiveKit Cloud as failover. Best of both worlds for teams that need control but also want reliability guarantees.

Real-time voice is no longer
a hard engineering problem.

LiveKit has turned what used to be months of WebRTC wrestling into a weekend project. The infrastructure is solved — now it's about what you build on top of it.