Engineering

AI Avatar on Entry-Level Chipsets: How On-Device Rendering Works on Budget Hardware

ST
Spatius Team
Jun 6, 2026 8 min read 分钟阅读

The assumption baked into most AI avatar platforms is that you have a reliable, fast connection and a cloud billing account to absorb the cost. The server renders the avatar as a video stream, you receive it, the user sees it. Clean in theory, fragile in practice.

The problem starts the moment you leave the ideal scenario: shared WiFi at a retail location, cellular signal in a warehouse, a kiosk device that costs $80 wholesale. Traditional cloud-rendered avatar video requires 1–2 MB/s of sustained bandwidth per session. On constrained networks and budget hardware, that’s where the experience breaks down.

Spatius takes a different approach. Rather than rendering video on the cloud and streaming it, Spatius’s cloud GPU runs a lightweight driving model that generates FLAME expression parameters — a compact description of how the avatar’s face should move. These parameters stream to the client at just 10–20 KB/s. The client device renders the 3DGS avatar locally using its own GPU. No video stream. No heavy decoding pipeline.

This architecture shift — from streaming rendered video to streaming expression parameters — is what makes Spatius run stably on entry-level chipsets.


Why Entry-Level Hardware Can Handle This

The GPU workload of rendering a real-time AI avatar in Spatius is fundamentally different from what people assume. There’s no heavy ML inference happening on-device. The driving model (which converts audio to facial expressions) runs in Spatius’s cloud. The client SDK — called AvatarKit — receives the pre-computed FLAME parameters and uses them to animate a 3DGS (3D Gaussian Splatting) avatar model that lives on the device.

Rendering a 3DGS avatar at 25–30 frames per second is the GPU task. The model itself is compact: approximately 5–10 MB. This is a manageable workload for any modern mobile or embedded GPU — far lighter than a mobile game running at 60fps.

Spatius officially runs 60 fps on entry-level chipsets. Validated hardware includes chips like the G88, S565, 8189, and RK3576 — all without requiring a dedicated GPU. For the tightest entry-level SOCs, stable 25fps is the floor; mid-range and above deliver 30–60fps.


What “Entry-Level” Means in This Context

The reference hardware Spatius targets includes:

Embedded / kiosk SoCs — Chips like the RK3576 are common in commercial Android kiosk hardware. They handle AvatarKit’s rendering pipeline stably at 25fps without thermal throttling under sustained sessions.

Budget Android and IoT devices — Chipsets like the G88 and S565 represent the class of hardware found in the $100–$200 Android device range globally. These run AvatarKit across Web, iOS, and Android SDKs.

Web deployments on aging hardware — Because the Web SDK uses WebGL / WebGPU (available in all modern browsers), even older desktop and laptop hardware running Chrome or Safari can render the avatar locally without a dedicated GPU.

The key enabler across all of these: zero on-device inference. AvatarKit only does rendering and audio alignment. The computationally expensive step — generating the expression parameters from audio — is handled by Spatius’s cloud GPU, which is lightweight by design (it produces small FLAME parameter packets, not video).


The Bandwidth Equation

Traditional cloud-rendered avatar platforms stream video. A session at standard quality needs 1–2 MB/s, continuously. Across 20 simultaneous kiosks, that’s 20–40 MB/s of committed bandwidth for avatar video alone.

Spatius streams FLAME expression driving data: 10–20 KB/s per session. Twenty kiosks need roughly 200–400 KB/s total. That’s negligible on any business internet connection, stable on 4G cellular, and viable even on degraded network conditions.

This isn’t a configuration trade-off — it’s an architectural property. The compression ratio comes from streaming compact expression parameters instead of encoded video frames.


How the Architecture Fits Together

Spatius follows a three-layer separation:

  1. AI Agent (customer-built) — You build and own the full voice AI stack: ASR (speech-to-text), LLM (language model), and TTS (text-to-speech). Spatius does not provide these components.

  2. Avatar (Spatius) — The 3DGS digital human model, built from a single photo in as little as 3 hours, approximately 5–10 MB in size.

  3. AvatarKit SDK (Spatius core product) — The rendering engine that lives on the client device, receives FLAME parameters from Spatius’s cloud driving model, and renders the avatar in sync with the audio.

The data flow:

[Your TTS audio] → Spatius cloud driving model → FLAME expression params (10–20 KB/s)

                                                 AvatarKit (client SDK)
                                                 renders 3DGS avatar locally

The cloud GPU in this pipeline runs a lightweight driving inference — not video encoding. The rendering cost is dramatically reduced compared to full cloud rendering, though not eliminated entirely. The client does the visual rendering at zero inference cost.


SDK Integration

AvatarKit ships for three platforms:

PlatformDistributionRendering Engine
Webnpm: @spatialwalk/avatarkitWebGL / WebGPU
iOSAvatarKit.xcframeworkMetal
AndroidGradle: ai.spatialwalk:avatarkitVulkan

Three integration modes are available depending on your latency and infrastructure requirements:

  • Basic Mode — Lowest development effort, suitable for Web/iOS/Android, moderate latency
  • LiveKit Plugin — Ultra-low latency, Web only, for teams already using LiveKit Agents
  • Custom Mode — Ultra-low latency, full transport control, Web/iOS/Android

Note: iOS testing requires a physical device — the iOS simulator does not support Metal rendering.

Full code samples for Web, iOS, Android, and Flutter are available in the Voice Agent Demo repository.


Fallback Behavior

If the WebSocket connection to Spatius’s cloud driving model fails within 15 seconds, AvatarKit automatically switches to audio-only fallback mode. The TTS audio continues uninterrupted; only the avatar animation pauses. This ensures the user experience degrades gracefully on flaky networks rather than breaking entirely.


Getting Started

The Spatius playground runs AvatarKit in your browser. The avatar rendering happens on your device using WebGL/WebGPU — no video stream. You can verify this in your browser’s network tab: you’ll see small parameter packets, not a video stream at 1–2 MB/s.

For device-specific testing, the demo repositories include iOS, Android, and Flutter clients ready to run on physical hardware.


From the Spatius team:
“replacing heavy cloud rendering with a lightweight stream and rendering on edge”


Go deeper on architecture → On-Device AI Avatar vs Cloud Streaming: Architecture, Bandwidth, and Cost

See it in a real deployment context → AI Avatars for Edge Deployments: Kiosks, Retail, and Low-Bandwidth Environments

Test before you commit → Avatar SDK Demo: How to Test a Real-Time AI Avatar Before You Commit to a Platform

The full landscape → Interactive Avatar: The Complete Guide to Real-Time AI Avatars in 2026

AI avatar edge rendering entry-level chipset AvatarKit budget hardware