Most AI avatar demos are built and tested in ideal conditions: developer workstation, dedicated office WiFi, single session at a time. The experience looks great. Then you try to deploy to a retail kiosk, a field technician’s tablet, or a hotel lobby screen — and the problems start.
This guide is specifically about edge deployments: real-world environments where the network isn’t dedicated, the hardware isn’t premium, and the avatar needs to run reliably whether or not conditions are optimal.
Why Cloud Streaming Doesn’t Fit Edge Deployments
The standard AI avatar architecture — render video on a cloud server, stream it to the display device via WebRTC — has one hard dependency: sustained video bandwidth. A session at standard quality requires 1–2 MB/s, continuously. This is the floor set by video encoding physics. Below it, quality degrades or the stream stalls.
That constraint creates structural problems in edge environments:
Shared WiFi is not a dedicated connection. A retail store’s network carries POS terminals, inventory management systems, customer devices, and back-office traffic simultaneously. During peak hours, available bandwidth per device is unpredictable. 1–2 MB/s per kiosk session — multiplied across multiple units in the same location — competes directly with business-critical network traffic.
Cellular connectivity is variable. Field tablets in warehouses, factories, or outdoor environments encounter weak and fluctuating 4G signal. A connection that delivers 4 Mbps on a clear day might deliver 500 Kbps inside a steel-framed building. Cloud-streamed video requires ~1 MB/s to function; it doesn’t have a meaningful degraded mode.
Multi-unit bandwidth adds up fast. Ten simultaneous kiosk sessions at 1–2 MB/s each require 10–20 MB/s of committed bandwidth at the location — before any other network use. Across a 50-location retail deployment, the infrastructure cost of providing that dedicated bandwidth is non-trivial.
Cloud GPU cost scales with sessions. At the industry average of approximately $0.15/minute for cloud-rendered avatar sessions, a high-traffic kiosk running 8 hours/day at 50% occupancy accumulates significant per-unit cost per month.
On-Device Rendering: The Architecture That Fits
Spatius takes a different approach. Rather than rendering video in the cloud and streaming it, Spatius’s cloud GPU runs a lightweight driving model that converts TTS audio to FLAME expression parameters — a compact mathematical description of how the avatar’s face should move. These parameters stream to the client device at 10–20 KB/s. AvatarKit, Spatius’s rendering SDK, uses those parameters to render the 3DGS avatar locally.
The official description of the approach: “replacing heavy cloud rendering with a lightweight stream and rendering on edge.”
The bandwidth comparison: 10–20 KB/s versus 1–2 MB/s is a ~99% reduction. Ten simultaneous kiosk sessions need roughly 100–200 KB/s total — a rounding error on any business internet connection. The avatar rendering itself is local, so network quality affects responsiveness (how quickly FLAME parameters arrive) but never affects rendering frame rate or visual quality.
Important: Spatius is a rendering SDK, not a complete AI platform. You build and own your own voice AI stack — ASR (speech-to-text), LLM (language model), and TTS (text-to-speech). Spatius connects to your TTS audio output and handles the avatar animation and rendering from that point.
Hardware Requirements
Because AvatarKit does only rendering and audio alignment — no ML inference on-device — the GPU workload is light. Officially, Spatius runs 60 fps on entry-level chipsets. The hardware range this covers:
Validated chipsets: G88, S565, 8189, RK3576 — all without requiring a dedicated GPU. These represent the class of chips found in commercial Android kiosk hardware and budget mobile devices.
Entry-level SOCs: Stable at 25fps under sustained load.
Mid-range hardware: 30–60fps.
Web-based deployments: Any device capable of running a modern browser with WebGL/WebGPU support. This includes commercial display hardware running Chrome OS or a standard browser on Linux or Windows IoT.
The 3DGS avatar model is approximately 5–10 MB and lives on the device after initial download. Subsequent sessions load from the local cache.
Deployment Patterns for Common Edge Scenarios
Retail Kiosk
Environment: Fixed-location Android kiosk or commercial display with embedded compute. Shared store WiFi that fluctuates with customer traffic.
What the architecture solves: Each session consumes 10–20 KB/s, so 10 simultaneous kiosks need roughly 100–200 KB/s total — unaffected by shared network load. AvatarKit runs natively on the kiosk’s embedded hardware via the Web SDK (WebGL/WebGPU) with no additional GPU required.
Fallback behavior: If the WebSocket connection to Spatius’s cloud driving model fails within 15 seconds, AvatarKit automatically switches to audio-only mode. TTS audio continues; animation pauses. In a walk-up public installation, this graceful degradation matters — a hard failure creates a broken screen experience; audio-only continues to serve the user.
Cost comparison: At $0.007/minute (Spatius Scale plan = $0.42/hour), versus the industry average of approximately $0.15/minute (~$9/hour). Across a 50-unit deployment at moderate utilization, this difference is material in unit economics.
Field Technician / Mobile Deployment
Environment: Android tablet. 4G cellular with variable signal — warehouses, factories, outdoor sites, RF-shielded facilities.
What the architecture solves: 10–20 KB/s is viable on weak or variable 4G. The avatar rendering is unaffected by momentary signal drops — only the next FLAME parameter update is delayed, producing a brief pause in the avatar’s speech rather than a frozen or broken stream. AvatarKit is available as a native Android SDK (Gradle: ai.spatialwalk:avatarkit, Vulkan renderer).
Connectivity consideration: Spatius’s cloud driving model requires network connectivity to generate FLAME parameters. For environments with intermittent connectivity, the 15-second fallback to audio-only mode provides a safety net.
Hospitality / Commercial Display
Environment: Large-format commercial display with embedded compute or connected mini-PC. Shared guest WiFi.
What the architecture solves: On shared guest networks, per-device bandwidth is unpredictable. 10–20 KB/s is viable on nearly any WiFi condition. The Web SDK (WebGL/WebGPU) runs directly in the browser on commercial display hardware without a dedicated GPU.
Healthcare Information Point
Environment: Tablet or touchscreen on a stand. Hospital WiFi is often network-segmented with strict per-device bandwidth limits. Conversations may include sensitive information.
What the architecture solves: The 10–20 KB/s bandwidth requirement fits within typical network segmentation limits. On the privacy side: AvatarKit only receives AI-generated TTS audio data. Rendering runs on the user’s local device. Sensitive user conversation content does not pass through Spatius’s rendering infrastructure.
In-vehicle / AI hardware: Spatius explicitly lists In-Vehicle & Kiosks and AI Hardware as supported deployment categories. AvatarKit’s low bandwidth and on-device rendering make it compatible with embedded automotive displays and standalone AI hardware form factors where cloud video streaming is impractical.
Integration
AvatarKit ships for three platforms:
| Platform | Distribution | Rendering Engine |
|---|---|---|
| Web | npm: @spatialwalk/avatarkit | WebGL / WebGPU |
| iOS | AvatarKit.xcframework (Metal) | Metal |
| Android | Gradle: ai.spatialwalk:avatarkit | Vulkan |
Three integration modes are available:
- Basic Mode — Lowest setup effort, Web/iOS/Android, moderate latency
- LiveKit Plugin — Ultra-low latency, Web only (for teams with existing LiveKit Agents infrastructure)
- Custom Mode — Ultra-low latency, full transport control, Web/iOS/Android
Server-side SDKs are available in Python (pip install spatius) and Go (go get github.com/spatius-ai/spatius-sdk-go).
Full working examples for Web, iOS, Android, and Flutter are in the Voice Agent Demo repository.
Pricing
| Plan | Monthly | Per minute | Per hour |
|---|---|---|---|
| Free | $0 | — | — (~50 min included) |
| Starter | $19/mo | $0.009/min | ~$0.54/hr |
| Scale | $299/mo | $0.007/min | $0.42/hr |
| Enterprise | Custom | Custom | Custom |
Credits do not roll over month to month (subscription credits reset each period). A free tier is available permanently — no credit card required.
Getting Started
The playground runs AvatarKit in your browser — useful for a quick hardware validation on a specific device before committing to SDK integration. For edge deployment scoping or enterprise fleet discussions, contact the Spatius team.
Related Reading
Hardware requirements in detail → AI Avatar on Entry-Level Chipsets: How On-Device Rendering Works on Budget Hardware
Architecture comparison → On-Device AI Avatar vs Cloud Streaming: Architecture, Bandwidth, and Cost
Test the SDK → Avatar SDK Demo: How to Test a Real-Time AI Avatar Before You Commit to a Platform
The full landscape → Interactive Avatar: The Complete Guide to Real-Time AI Avatars in 2026