Integrating AI Avatars with LiveKit: A Developer Guide to the Spatius LiveKit Plugin

If you’re building a voice AI application with LiveKit Agents, adding a real-time avatar face is one of the highest-leverage improvements you can make to perceived engagement. The technical barrier is lower than it looks — if your voice pipeline already runs on LiveKit, the Spatius LiveKit Plugin drops in as a single dependency with minimal configuration.

This guide covers how the integration works, how to set it up, and what to watch for in production.

Why LiveKit + AI Avatar Is a Natural Pairing

LiveKit is a popular open-source WebRTC framework used by teams building real-time voice and video applications. LiveKit Agents — the Python framework for building AI voice agents — handles the ASR, LLM, and TTS orchestration that powers conversational AI. Spatius handles what LiveKit Agents doesn’t: driving an avatar’s face in real time from the audio stream that the voice agent produces.

The integration works cleanly because of how Spatius is architectured. Spatius doesn’t replace or intercept your voice pipeline — it observes the audio your TTS outputs and keeps the avatar’s Motion data in sync. Your LLM, your voice models, your session logic: all of that stays unchanged. Spatius adds a face.

Why not just stream video from the cloud? For the same reason you’d choose LiveKit over a hosted video conferencing product: control, latency, and cost. Cloud-rendered avatar video requires 1–2 MB/s per session. Spatius’s on-device rendering approach uses 10–20 KB/s — approximately 99% less bandwidth — because only Motion data is transmitted, not rendered video frames. The avatar renders on the browser using WebGL or WebGPU.

Architecture Overview

User Speech
    ↓
[LiveKit Room] → ASR → LLM → TTS (your stack)
                                    ↓
                            Spatius Plugin
                                    ↓
                    [Cloud: Motion Server]
                                    ↓
                          Motion data (~10-20 KB/s)
                                    ↓
                        [Browser: 3DGS avatar renders locally]

The Spatius LiveKit Plugin hooks into the TTS audio output in your agent and transmits it to Spatius Motion Server, which generates Motion data. Motion data is streamed to the client SDK in the browser, which renders the 3DGS avatar model in real time using WebGL or WebGPU. The avatar’s lip sync and expression are driven by Motion data, not by video streaming.

Important scope note: the LiveKit Plugin currently supports Web only. For iOS and Android deployments, use Basic Mode or Custom Mode with the platform-native SDKs (AvatarKit.xcframework for iOS, Gradle ai.spatialwalk:avatarkit for Android).

Prerequisites

Before starting:

A LiveKit Cloud account or self-hosted LiveKit server
An existing LiveKit Agents voice pipeline (Python)
A Spatius account — the free tier includes 500 credits/month (~50 minutes), which is enough for development and testing
Node.js environment for the browser client (the Spatius client SDK is distributed as @spatius/avatarkit-rtc on npm)

Spatius credentials you’ll need: App ID, API Key, Avatar ID, and a Session Token. These are available in the Spatius dashboard at app.spatius.ai.

Server Side: Python Setup

Install the Spatius LiveKit plugin for Python:

pip install livekit-plugins-spatius

In your LiveKit Agents pipeline, import and configure the Spatius plugin alongside your TTS:

from livekit.plugins import spatius

# In your agent setup
spatius_plugin = spatius.AvatarPlugin(
    app_id="your_app_id",
    api_key="your_api_key",
    avatar_id="your_avatar_id",
)

# Register it in your pipeline
agent = VoicePipelineAgent(
    vad=...,
    stt=...,
    llm=...,
    tts=...,
    avatar=spatius_plugin,  # add this
)

The plugin receives the TTS audio stream before it reaches the LiveKit room, forwards it to Spatius Motion Server, and returns synchronized Motion data to the client.

One critical version note: the browser client requires livekit-client version 2.16.1 exactly. Other versions may cause connection failures or audio sync issues. Pin this dependency in your package.json:

{
  "dependencies": {
    "livekit-client": "2.16.1",
    "@spatius/avatarkit-rtc": "latest"
  }
}

Browser Client Setup

Install the client SDK:

npm install @spatius/avatarkit-rtc livekit-client@2.16.1

Initialize the avatar in your browser application:

import { AvatarKitRTC } from '@spatius/avatarkit-rtc';

const avatarKit = new AvatarKitRTC({
  appId: 'your_app_id',
  sessionToken: 'your_session_token', // generated server-side
  avatarId: 'your_avatar_id',
  container: document.getElementById('avatar-container'), // DOM element for rendering
});

// Connect to the LiveKit room and start the avatar session
await avatarKit.connect(livekitRoomUrl, livekitToken);

The SDK handles WebGL/WebGPU initialization, 3DGS model loading (5–10 MB, downloaded on first use), and the Motion data rendering loop. The avatar canvas renders inside the container element you provide.

Demo Repository

A full working demo — including server-side Python agent and browser client — is available at:

Voice Agent Demo (all platforms): github.com/spatialwalk/avatarkit-voice-agent-demo
Web client specifically: sdk-mode/clients/web

The demo uses Basic Mode by default. The LiveKit Plugin setup is documented in the demo’s README under the livekit-plugin branch.

Full API reference: docs.spatius.ai

Latency: LiveKit Plugin vs Basic Mode

The LiveKit Plugin exists specifically to reduce end-to-end avatar latency for web deployments.

Basic Mode connects the client directly to Spatius’s inference API. The total additional latency from the Spatius layer — the time between the TTS audio being ready and the avatar face beginning to move — is typically under 300ms.

LiveKit Plugin fits into the LiveKit-based flow your agent is already managing. This reduces integration overhead for web deployments that already use LiveKit Agents. In practice, the difference is most noticeable in high-concurrency scenarios or when the user is geographically distant from Spatius’s inference regions (ap-northeast and us-west).

If you’re already using LiveKit Agents and targeting web, the LiveKit Plugin is the recommended path. If you need mobile support (iOS/Android) or prefer a simpler integration without LiveKit, Basic Mode is the right choice.

Rendering on the Browser: What’s Happening

The browser SDK renders the avatar using WebGL or WebGPU (the SDK prefers WebGPU when available and falls back to WebGL). The 3DGS model is a compact 5–10 MB file that represents the avatar as a collection of 3D Gaussian splats — a volumetric representation that produces high-fidelity, naturally-lit appearances from any angle without the uncanny-valley issues of traditional polygon meshes.

This is why the bandwidth stays at 10–20 KB/s. Rather than streaming rendered video frames from the cloud, the SDK streams only Motion data. The rendering itself — the computationally intensive part — runs on the user’s GPU via WebGL/WebGPU.

For browser deployments targeting shared or public kiosks, this architecture also has a privacy benefit: the avatar renders locally, the avatar layer receives the AI-generated audio used for facial driving, while transcripts, prompts, and other sensitive user data can remain in your own voice stack. The 3DGS model is cached after first load.

Handling the Avatar Session Token

The Session Token is short-lived and must be generated server-side. Don’t embed your API Key in client-side code.

Your server generates a Session Token by calling the Spatius API with your App ID and API Key, and returns only the token to the browser. The browser uses the token with the AvatarKit SDK. This is a standard pattern; the demo repository includes server-side token generation examples in Python and Go.

For Go server SDK: github.com/spatius-ai/spatius-sdk-go
For Python SDK: pip install spatius — github.com/spatius-ai/spatius-sdk-python

Common Issues

Avatar renders but lip sync is out of sync: this is almost always a TTS audio format mismatch. Spatius requires mono 16-bit PCM (s16le). Common rates — 8kHz, 16kHz, 22050, 24kHz, 32kHz, 44.1kHz, 48kHz — are all supported. The SDK doesn’t auto-resample, so confirm your TTS output matches the format you’re passing. The Python SDK additionally supports Ogg Opus input.

WebSocket connection fails: check that the credential region matches your Spatius account configuration. Credentials (App ID, API Key, Avatar ID, Session Token) are cross-region and work with both ap-northeast and us-west endpoints.

Avatar doesn’t appear on iOS or Android: the LiveKit Plugin is Web-only. For mobile, switch to Basic Mode with the platform-native SDKs. Note that iOS rendering requires a real device — the iOS Simulator doesn’t support Metal rendering.

livekit-client version mismatch: if you see connection errors or audio issues after installing dependencies, run npm list livekit-client to confirm the version is exactly 2.16.1. Package managers sometimes resolve to a different version when multiple dependencies specify the package.

Try It Before You Integrate

The fastest path to evaluating Spatius before integrating the SDK is the Playground — a live session with a Spatius avatar running in your browser, no signup required: spatius.ai/playground

For a broader overview of how to evaluate any real-time avatar platform before committing to an integration: Avatar SDK Demo: How to Test a Real-Time AI Avatar Before You Commit to a Platform