Skip to content

Richman-Tan/DementiaGuideAI

Repository files navigation

DementiaGuide AI

A modern iOS mobile application that acts as a digital library for dementia care information. Users can ask questions through text or voice and receive responses through a real-time 3D avatar with lip-sync driven by ElevenLabs character-level alignment.


Application Workflow

DementiaGuide AI Workflow

RAG Pipeline Workflow

RAGPipeline

Video Walkthrough

Screen.Recording.2026-04-29.at.1.24.15.PM.mov

Overview

DementiaGuide AI is designed for caregivers, family members, and healthcare professionals. The app provides evidence-based dementia care guidance through a calm, accessible, and emotionally supportive interface. The AI avatar — Aria — is a VRM model rendered in real time with natural speech, multi-shape lip-sync driven by ElevenLabs character-level alignment, and expressive idle animations.


Tech Stack

Layer Technology
Framework React Native (Expo SDK 54)
Navigation React Navigation 7 (Bottom Tabs + Native Stack)
AI / RAG OpenAI gpt-4o-mini + text-embedding-3-small
STT OpenAI Whisper (whisper-1) via expo-av audio recording
TTS ElevenLabs eleven_turbo_v2_5 (primary) · OpenAI tts-1 (fallback)
Lip Sync ElevenLabs character-level alignment → viseme timeline → 5 VRM blend shapes
Avatar VRM 3D model via Three.js r180 + @pixiv/three-vrm in a WebView
Animations React Native Animated API
Gradients expo-linear-gradient
Audio expo-av · Web Audio API (WebView)
Haptics expo-haptics
Safe Area react-native-safe-area-context
Storage @react-native-async-storage/async-storage · expo-secure-store

Screens

Screen Description
Home Avatar hero card, quick question chips, text/voice entry, navigation grid
Chat iMessage-style conversation, typing indicator, clickable source links
Library Searchable knowledge base across 6 dementia-care categories with article detail view
Voice Full-screen voice UI — records via Whisper STT, streams LLM response, plays avatar speech sentence-by-sentence with lip sync
Settings Accessibility controls — text size, contrast, audio, subtitles, haptics, privacy

Project Structure

DementiaGuideAi/
├── App.js
├── babel.config.js
├── app.json                          # Expo config
├── scripts/
│   └── test-responses.mjs            # CLI tool to test RAG output against sample questions
└── src/
    ├── navigation/
    │   └── AppNavigator.js           # Bottom tab + stack navigator
    ├── screens/
    │   ├── HomeScreen.js
    │   ├── ChatScreen.js             # GiftedChat UI, calls openaiService, shows sources
    │   ├── LibraryScreen.js
    │   ├── ArticleDetailScreen.js    # Full article view from Library
    │   ├── VoiceScreen.js            # Voice conversation UI (Whisper → LLM → TTS → avatar)
    │   └── ProfileScreen.js          # AI configuration (API keys, privacy controls)
    ├── components/
    │   ├── AvatarVRM.js              # VRM avatar in WebView (Three.js + viseme lip sync)
    │   ├── Avatar.js                 # Legacy animated avatar (idle/listening/speaking)
    │   ├── MessageCard.js            # Chat bubble with sources and actions
    │   ├── CategoryCard.js           # Library category row
    │   └── VoiceWaveform.js          # 9-bar animated waveform
    ├── hooks/
    │   └── useAvatarConversation.js  # Voice pipeline orchestration (STT → LLM stream → TTS queue → playback)
    ├── lib/
    │   ├── tts/
    │   │   ├── ttsService.js         # TTS provider selection (ElevenLabs primary, OpenAI fallback)
    │   │   └── elevenLabsService.js  # ElevenLabs API wrapper (audio + character alignment)
    │   └── lipsync/
    │       ├── createVisemeTimeline.js  # Converts ElevenLabs alignment → viseme frame sequence
    │       └── phonemeMap.js            # Character → VRM viseme mapping
    ├── constants/
    │   ├── colors.js
    │   ├── typography.js
    │   └── data.js                   # Categories, resources, sample messages
    ├── data/
    │   └── knowledgeBase.js          # 42 dementia care knowledge chunks (7 per category)
    └── services/
        ├── openaiService.js          # Full RAG pipeline (embeddings, semantic search, streaming chat, Whisper STT)
        ├── aceService.js             # NVIDIA ACE stub (used by VoiceScreen mock)
        └── knowledgeService.js       # Knowledge base search (used by LibraryScreen)

Getting Started

Prerequisites

  • Node.js 20+
  • Expo CLI
  • Xcode (for iOS Simulator) or Expo Go on a physical device
  • An OpenAI API key
  • An ElevenLabs API key (optional — enables vowel-accurate lip sync; falls back to amplitude-based sync without it)

Install

git clone <repo-url>
cd DementiaGuideAi
npm install

Run

# iOS Simulator
npx expo start --ios

# Android
npx expo start --android

# Clear Metro cache if needed
npx expo start --ios --clear

API Key Setup

Enter your API keys in the app under Settings → AI Configuration:

  • OpenAI key — required for chat, STT (Whisper), and fallback TTS
  • ElevenLabs key — optional; enables the full viseme lip sync pipeline

Both keys are stored securely via expo-secure-store and never leave the device.


Voice Conversation Pipeline

The Voice screen runs a fully pipelined conversation flow managed by useAvatarConversation.js:

[Microphone] → expo-av recording
     ↓
[Whisper STT] → transcribed text
     ↓
[OpenAI gpt-4o-mini stream] → tokens arrive sentence by sentence
     ↓
[ElevenLabs TTS] ← fires immediately per sentence, in parallel
     ↓
[Viseme timeline] ← character alignment → mouth shape keyframes
     ↓
[AvatarVRM WebView] → plays audio + drives 5 blend shapes in real time

Each sentence is sent to TTS as soon as it completes in the LLM stream — so the avatar begins speaking the first sentence while later sentences are still being generated.


Avatar (AvatarVRM)

The avatar is a .vrm model rendered inside a React Native WebView using Three.js and @pixiv/three-vrm. All animation runs in the embedded browser context and communicates back to React Native via postMessage.

State machine: idle → listening → thinking → speaking

Each state drives:

  • Body bob and sway amplitude
  • Head look-around frequency and range
  • Thinking gaze bias (up-right)
  • Breathing depth on spine/chest bones

Lip sync — ElevenLabs viseme path (primary)

ElevenLabs returns character-level timestamps alongside the audio. These are converted into a viseme frame sequence by createVisemeTimeline.js, mapping characters to one of five VRM blend shapes: aa (open), ih (smile-open), ou (round), ee (wide), oh (rounded-open). During playback, the WebView tracks AudioContext.currentTime each frame, binary-searches the viseme timeline, and cross-fades between the active and next frame over the final 20% of each frame's duration.

Lip sync — RMS fallback path (OpenAI TTS or no ElevenLabs key)

When no alignment data is available, a Web Audio AnalyserNode measures RMS amplitude per frame and maps it to the aa blend shape, producing open/close jaw movement that tracks the audio loudness.

Recovery: If the WebGL context is lost (iOS background eviction, Android process kill), the WebView automatically remounts.

Custom VRM model: Pass a modelUrl prop to AvatarVRM to use any publicly hosted .vrm file.

<AvatarVRM
  ref={avatarRef}
  modelUrl="https://example.com/your-model.vrm"
  isListening={listening}
  isSpeaking={speaking}
  isThinking={thinking}
  width={300}
  height={420}
/>

// Play TTS audio with viseme lip sync (ElevenLabs path)
await avatarRef.current.playAudio({ audio: base64DataUri, visemeTimeline });

// Play TTS audio with RMS fallback
await avatarRef.current.playAudio(base64DataUri);

// Stop early
avatarRef.current.stopAudio();

RAG Pipeline

The chat is powered by a fully client-side RAG pipeline in src/services/openaiService.js.

Setting Value
Embedding model text-embedding-3-small (1536 dims)
Chat model gpt-4o-mini
Context window Last 6 messages
Retrieval Top-5 chunks, min similarity 0.25
Embedding cache AsyncStorage key kb_embeddings_v2
Message history AsyncStorage key chat_messages_v1 (max 100)

The knowledge base (src/data/knowledgeBase.js) contains 42 curated dementia care chunks across 6 categories: caregiving, clinical, behavioral best practices, home safety, wellbeing, and communication.

Testing RAG output

OPENAI_API_KEY=sk-... node scripts/test-responses.mjs

Runs a set of sample questions through the full pipeline and prints each response alongside the retrieved knowledge base chunks and their similarity scores. Edit the QUESTIONS array in the script to test specific queries.


Design System

Token Value Use
Primary #4A7C8E Buttons, links, user bubbles
Secondary #7FB5A0 Accents, success states
Accent #E8956D Warnings, speaking state
Background #F7F5F2 App background
Surface #FFFFFF Cards, nav bar
Text Primary #1E2D3D Body and headings

Accessibility:

  • Minimum 44×44pt tap targets
  • accessibilityLabel and accessibilityRole on all interactive elements
  • Configurable text size (small / medium / large)
  • High contrast mode toggle
  • Subtitle and audio toggles for avatar responses
  • Haptic feedback toggle

Disclaimer

DementiaGuide AI provides information for general guidance only. It is not a substitute for professional medical advice, diagnosis, or treatment. Always consult a qualified healthcare provider for dementia-related concerns.


License

Private — all rights reserved.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors