CASE_ID: MAIA

The MAIA Experience

An intimate, 10‑minute encounter with MAIA — a real‑time AI character that sees, listens, and converses with visitors inside "Prof. Dupin’s study" as onboarding to a larger story world. Think AI meets Disney pre‑show experience - personalized for each guest.

ROLEMS in Emerging Tech, AI & Design at NYU. Lead Designer & Engineer. End‑to‑end experience design, LLM prompt engineering, real‑time voice interaction system architecture, and front‑end development for the interactive installation.
DOMAINGenerative Voice & Audio
STACK
GenAIVoice UIPythonLLM
DEPLOYMENTthe-maia-experience.framer.ai
The Challenge

"GenAI breaks immersion when latency is high and trust is low. The challenge was designing a "Latency Masking" system and "privacy-first" approach that kept users engaged during the 3-second compute window of early LLMs. Making the AI feel human required more than just dialogue design — it demanded a holistic system that considered pacing, lighting, trust and physical space."

System Architecture

How it Works

A local-first stack designed to process voice-to-voice interaction in under 200ms using a quantized Llama model running on my local MacBook Pro M3 Metal/GPU. The system uses a multithreaded architecture to handle audio input (STT), LLM inference, and audio output (TTS) in parallel, minimizing wait times. A custom Python backend orchestrates the flow, while a React frontend manages the technical operating tools. Privacy is ensured by processing all data locally, with no cloud dependencies.

Key Components
  • 1Immersive production design to establishing world/story and believability.
  • 2Dialogue states changed with audio and visual triggers. (camera → STT → LLM → TTS → led lighting).
  • 3Multi-threaded local LLM running on edge device for low latency and privacy.
  • 4Participant privacy agency UI preventing responses automatically saved to cloud.
UX & Interaction Layer

Designing the Interface

Really there were two interfaces: 1. The operation of the audio/visual/AI experience. 2. The live audio chat interface with the AI. To mitigate the 'Uncanny Valley,' I designed the interface to be an invisible layer. Instead of a chat window, the UX relied on LED cues—lighting changes and subtle sound design—to signal the AI's 'listening' and 'thinking' states. This reduced the cognitive load of a standard conversational UI, allowing users to maintain eye contact with the physical avatar. The second interface (shown right) is the workspace I custom made to give me operational control of lights, computer vision inputs and outputs, inference, and audio (DAW). The left image is the prototype wireframe and the right is the final product UI.

Outcomes & Impact
  • 01

    Engineered a local-LLM mult-threaded architecture that reduced latency from 10s to <200ms.

  • 02

    Designed 'thinking state' animations that maintained narrative immersion during processing.

  • 03

    Personalized interaction pacing and conversation based on discussion with AI.

  • 04

    Proved viability of 'Privacy-First' AI by processing all voice data locally (no cloud).

  • 05

    Each experience was unique, with no two encounters alike based on visitor input, and take away physical mementos generated by AI character.