meetsansara.com/tech

SANSARA Technical Specification

v0.7 · March 2026 · Pre-release

01

System Overview

SANSARA is an offline-first AI wellness companion built on React Native 0.84. All inference, storage, and personalization run on-device. The application ships no network permissions, no API keys, and no analytics SDK. The binary is self-contained.

┌─────────────────────────────────────────────────────────────────┐
│                        USER INPUT                               │
│   Text   │   Voice (Whisper)   │   Face (MLKit)   │  Biometrics │
└────┬─────┴──────────┬──────────┴────────┬──────────┴────────┬───┘
     └────────────────┴─────────┬─────────┴───────────────────┘
                                │
                    ┌───────────▼───────────┐
                    │    SENSORY FUSION     │
                    │  Unified context vec  │
                    └───────────┬───────────┘
                                │
                    ┌───────────▼───────────┐
                    │    MATRIX ROUTER      │
                    │   <100M param clf     │
                    │   Selects 2-4 agents  │
                    └───────────┬───────────┘
                                │
                    ┌───────────▼───────────┐
                    │     BLACKBOARD        │
                    │  Shared mutable JSON  │
                    └───────────┬───────────┘
                                │
                    ┌───────────▼───────────┐
                    │   CONSENSUS LOOP      │
                    │  Propose → Critique   │
                    │  → Refine → Converge  │
                    └───────────┬───────────┘
                                │
                    ┌───────────▼───────────┐
                    │      RESPONSE         │
                    └───────────────────────┘
02

Model Tiers

All tiers run the identical 9-agent architecture, memory system, and personalization pipeline. The only variable is the base language model, determined by device hardware capability.

TierModelParamsQuantRAMHardware
BaseLlama 3.2 1B1B4-bit QLoRA~1.5 GBiPhone 14+, Pixel 7+, 4GB+
ProLlama 3.2 3B3B4-bit~3 GBiPad, Galaxy Tab, S25 Ultra, 6GB+
DesktopLlama 3.2 11B11B4-bit~6 GBM-series Mac, iPad Pro M4, 10GB+
03

Inference Pipeline

Compilation

PyTorch model
  → torch.export()          # Capture computational graph
  → torchao quantization    # 4-bit via SmoothQuant / SpinQuant
  → ExecuTorch lowering     # Hardware backend delegation
  → .pte binary             # Ahead-of-time compiled artifact

Hardware Backends

PlatformPrimary BackendFallback
iOSCoreML → Apple Neural EngineXNNPACK (CPU)
AndroidQNN → Qualcomm AI Engine / Adreno GPUXNNPACK (CPU)
DesktopCoreML (Apple Silicon) / CPUXNNPACK

Speculative Decoding

Lightweight draft model predicts multiple future tokens in parallel. Primary model verifies in a single batch. Measured speedup: 2.2-3.6x. Processor at peak for a fraction of a second, then back to low-power sleep.

Runtime Footprint

ExecuTorch base runtime:    50 KB
react-native-executorch:   JSI bridge, declarative API
Model artifact (.pte):     ~500 MB (1B), ~1.5 GB (3B), ~4 GB (11B)
KV cache overhead:         ~200-500 MB (context-dependent)
04

Multi-Agent Architecture

Blackboard Pattern

Agents do not communicate directly. They read from and write to a shared mutable state object (the Blackboard). Decoupled architecture eliminates sequential context degradation of linear chains.

Blackboard = {
  raw_input:        string,
  prosody_vector:   float[],
  emotion_vector:   float[192],
  biometric_state:  { steps, light, hrv, sleep },
  history_embeds:   float[][],     // HNSW nearest neighbors
  agent_proposals:  Proposal[],
  agent_critiques:  Critique[],
  consensus_state:  "pending" | "converged",
  final_response:   string | null
}

Matrix Router

Sub-100M parameter classification model. Analyzes fused input vector, selects 2-4 agents. Each inference pass uses a distinct system prompt against the shared base model.

Consensus Loop

loop:
  1. PROPOSE   — Each active agent writes a response proposal
  2. CRITIQUE  — Each agent reads peer proposals, posts critiques
  3. REFINE    — Each agent integrates feedback, rewrites
  4. MEASURE   — Control Shell computes semantic similarity
  5. IF similarity < threshold → GOTO 1
  6. CONVERGE  — Emit unified response
05

The Nine Agents

IDLayerInput SignalBehavior
ECHO_EMPATHEmotional MirroringText sentiment + vocal prosodyValidates immediate emotional state
PATTERN_ECHOEmotional MirroringCurrent input + historical memorySurfaces recurring emotional themes
BIO_MIRROREmotional MirroringSteps, light, HRV, sleepReflects body-mood connection
INSIGHT_WEAVERCognitive ReframingImmediate text inputRestructures negatives into reflective prompts
TREND_SAGECognitive ReframingHistorical vector embeddingsReveals macro-trends in micro-failures
ENV_THINKERCognitive ReframingAmbient light, location, timeContext-aware environmental reframing
ACTION_NURTURERCompassionate ActionAcute stress levelImmediate therapeutic micro-interventions
GROWTH_GUIDECompassionate ActionHistorical success databaseRecommends proven personal strategies
HABIT_ELEVATORCompassionate ActionBiometric telemetryLinks behavioral change to physical metrics

All nine agents share a single quantized base model. The Matrix Router time-slices inference passes with distinct system prompts. No nine separate models are loaded.

06

Multimodal Fusion Layer

ModalityImplementationRAMLatencyOutput
VoiceWhisper ASR (ggml-tiny.en)~40 MB<2s / chunkText + temporal prosody array
FaceML Kit Face Mesh + MobileNetV3 TFLite~5 MBReal-time192-dim emotion vector
BiometricsHealthKit / SensorManagerNegligiblePassiveJSON: { steps, light, hrv, sleep }

All modalities opt-in per session. Raw pixel data purged from RAM immediately after embedding extraction. Only the output vector persists on-device.

07

On-Device Storage

Database:        ObjectBox 4.0
Core:            C++ with memory-mapped files (FlatBuffers)
Vector index:    HNSW (Hierarchical Navigable Small World)
Binary size:     ~3 MB
Encryption:      AES-256 at rest
Search latency:  <5ms over millions of embeddings
Memory model:    5-year tiered consolidation
                 ├─ Hot:    last 30 days (full embeddings)
                 ├─ Warm:   30-365 days (compressed)
                 └─ Cold:   1-5 years (summary vectors)
08

On-Device Learning

LoRA Fine-Tuning

Method:          Low-Rank Adaptation
Target layers:   q_proj, v_proj (attention)
Trainable params: ~2-5M (vs 1B+ frozen base)
Adapter size:    10-50 MB
Trigger:         Device charging + idle (overnight)
iOS engine:      MLX Swift (unified memory, zero CPU↔GPU transfer)
Android engine:  ExecuTorch native training loop

Feedback Sessions

Weekly or monthly (user-configured). SANSARA initiates structured dialogue referencing specific past responses. User corrections converted to labeled training pairs for the next LoRA cycle. Passive learning from response quality signals: entry length, re-engagement rate, conversation flow depth.

Predictive Foresight

Algorithm:       ARIMA (Autoregressive Integrated Moving Average)
Input:           Historical emotional vectors from ObjectBox
Output:          Mood trajectory forecast (72h window)
Detection:       Cyclical dips, seasonal patterns, weekly stressors
Trigger:         Preemptive intervention written to Blackboard
09

Privacy Architecture

network_permission:     NONE
api_keys_in_binary:     NONE
analytics_sdk:          NONE
cloud_endpoints:        NONE
telemetry_hooks:        NONE

data_location:          DEVICE ONLY
encryption:             AES-256
bytes_transmitted:      0
accounts_required:      0

# Structurally incapable of transmitting data.
# Not by policy. By architecture.
10

Accessibility

FeatureImplementation
Interface modes8 modes: Minimal → Maximum visual intensity
TypographyOpenDyslexic font, adjustable kerning / line-height
Sensory controlsIndependent toggles: haptics, motion, soundscapes
InputVoice-first journaling with prosody analysis
FocusADHD Focus Mode — structured attention
ProgressStellar Alignments (cumulative) — no streak counters
ComplianceWCAG AAA on all text over shifting backgrounds
GlassmorphismConstrained blur (4-6px), 1px text shadows
11

Tablet Mode (Pro / Desktop)

CapabilityImplementation
HandwritingOn-device OCR → searchable text + preserved strokes
DrawingCanvas capture → visual pattern analysis
StylusPressure sensitivity, palm rejection, tilt detection
DevicesApple Pencil (iPad), Samsung S Pen (Galaxy Tab / S25 Ultra)
Vision (Desktop)11B model processes drawings as visual input
12

Stack Summary

Application:     React Native 0.84 (bare, not Expo managed)
UI framework:    React Native Skia (SkSL procedural shaders)
Animation:       Reanimated + react-native-svg
Inference:       ExecuTorch (Meta) + react-native-executorch
Models:          Llama 3.2 (1B / 3B / 11B), 4-bit quantized .pte
Voice:           whisper.rn (ggml-tiny.en)
Face:            Google ML Kit + MobileNetV3 TFLite
Storage:         ObjectBox 4.0 (HNSW vector index)
Encryption:      AES-256 on-device
Learning:        LoRA via MLX Swift (iOS) / ExecuTorch (Android)
Forecasting:     ARIMA on local emotional vectors
State:           Zustand + Blackboard JSON
Haptics:         Platform LRA APIs (affective patterns)
Build:           Fastlane (iOS + Android)
Testing:         Jest (TDD for Blackboard + inference)
Payment:         Stripe / Paddle (Web2App, external checkout)
Web:             Next.js 16 on Vercel

Questions: info@sansara.app