Skip to content

BasiliskOfSisyphus/speculative-intent-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speculative Intent System (SIS)

Patent Pending — USPTO Provisional Application on file. See PATENT_PENDING.md.

SIS is a two-tier architecture that fires a speculative intent on partial input — before the user finishes speaking or typing — and reconciles it against a slower, authoritative model in parallel. The result: time-to-first-action drops by 239× on text input while maintaining zero false-positive fires on ambiguous inputs across four consecutive validation runs.

This repository contains the open-source test harness used to validate the architecture across three input modalities (text-paste, text-typed, audio) and three hardware tiers (CPU, RTX 3070 CUDA, Jetson Orin Nano).


Architecture

flowchart LR
    subgraph Input["Input Stream (partial)"]
        A[Typed / Spoken / Pasted]
    end

    subgraph Tier1["Tier-1 — Fast Classifier"]
        B[Sentence-transformer\nMiniLM-L6-v2\nMedian: 8.9ms]
    end

    subgraph Tier2["Tier-2 — Authoritative Model"]
        C[LLM\nLlama 3.1 8B / Qwen\nMedian: ~2s]
    end

    subgraph Reconciler["Reconciler — 8 Internal Outcomes"]
        D{Match?}
    end

    subgraph UIState["4 User-Visible States"]
        E[MATCH — action stands]
        F[REFINEMENT — action updated]
        G[MISMATCH — action rolled back]
        H[CONFUSION — surfaces to user]
    end

    A -->|partial stream| B
    A -->|complete input| C
    B -->|speculative fire| D
    C -->|authoritative prediction| D
    D --> E
    D --> F
    D --> G
    D --> H
Loading

How It Works

  1. Tier-1 watches the partial input stream. When it recognizes an intent with sufficient confidence, it fires a speculative action immediately — in ~9ms on text, ~2.6s on audio with CUDA ASR.
  2. Tier-2 runs concurrently on the same input. By the time it finishes, the user may have already seen their action begin executing.
  3. The Reconciler classifies the relationship between Tier-1 and Tier-2's predictions across eight internal outcomes, and surfaces one of four UI states. This is the inventive core: UI states mask LLM latency as character expression rather than a loading spinner.
  4. Rollback is available for mismatched speculative actions. The architecture requires that Tier-1 only fires on rollback-feasible action classes in the current world state — preserving safety without sacrificing speed.

The architecture is modality-agnostic: the same orchestrator code drove text-typed, text-paste, audio-CPU, and audio-CUDA batteries with zero behavioral branching. Only the input streamer changes.


Validation Results

All thresholds were locked before any data was collected (see sis_test_harness/config/thresholds.json, timestamped 2026-05-21). This is the anti-bias provision: thresholds cannot be loosened after seeing data.

Text-Paste Battery — Primary T1 Latency Claim

Run ID 96a2827e | 600 trials | Hardware: RTX 3070 (CPU-only Tier-2)

Criterion Threshold Result Status
Easy-band Tier-1 accuracy ≥ 85% 100.0% (120/120)
Medium-band Tier-1 accuracy ≥ 70% 100.0% (90/90)
Hard-band Tier-1 accuracy ≥ 50% 66.7% (58/87)
Hard-band false-positive rate < 25% 0.00% (0/29)
Tier-1 median latency ≤ 200ms 8.924 ms

Headline: Median time-to-first-action drops from 2,128ms (baseline) to 8.924ms (SIS) — 239× faster, with zero false fires on ambiguous input.

Text-paste mode is where the T1 latency claim lives: input emits as a single partial, so t1_ms reflects pure classifier compute with no streamer-pacing artifact.

Audio Battery — CUDA Hardware (Consumer GPU)

Run ID e0916c62 | 600 trials | Hardware: RTX 3070 (faster-whisper small.en CUDA + CPU Tier-2)

Criterion Threshold Result Status
Medium-band Tier-1 accuracy ≥ 70% 100.0% (90/90)
Hard-band Tier-1 accuracy (behavioral) ≥ 50% 66.7% (58/87)
Hard-band false-positive rate < 25% 0.00% (0/29)
Tier-1 latency vs CPU baseline 16–21× speedup

Behavioral accuracy at parity with text-paste GREEN run. The 200ms T1 latency threshold does not apply to audio mode — wall-clock latency is bounded by utterance duration, not classifier compute. Full analysis in patent_anchor_data/README.md.

Safety Claim — Four Consecutive Runs

Zero hard-band false-positive fires across:

  • text-typed CPU (948c4160)
  • text-paste CPU (96a2827e)
  • audio CPU (624194eb)
  • audio CUDA (e0916c62)

The safety invariant holds across two input modalities and three hardware configurations.


Repository Structure

sis-public-repo/
├── README.md                    ← you are here
├── PATENT_PENDING.md
├── sis_test_harness/            ← full benchmark harness (see below)
│   ├── src/                     ← orchestrator, reconciler, Tier-1, Tier-2, streamers
│   ├── config/                  ← locked thresholds + corpus (pre-data-collection)
│   ├── audio/                   ← corpus WAV generation
│   ├── analysis/                ← stats, verdict, plotting scripts
│   ├── tests/                   ← pytest suite
│   ├── docs/                    ← architecture decisions, reconciler state diagram
│   └── results/                 ← CSV output (gitignored; patent_anchor_data is the archive)
└── patent_anchor_data/          ← locked, immutable validation datasets
    ├── README.md                ← full per-dataset analysis
    ├── 2026-05-22_text-paste_battery_96a2827e.csv
    ├── 2026-05-22_text-paste_stats_96a2827e.json
    ├── 2026-05-22_text-typed_battery_948c4160.csv
    ├── 2026-05-22_text-typed_stats_948c4160.json
    ├── 2026-05-23_audio_battery_624194eb.csv
    ├── 2026-05-23_audio_stats_624194eb.json
    ├── 2026-05-23_audio-rtx3070-cuda_battery_e0916c62.csv
    └── 2026-05-23_audio-rtx3070-cuda_stats_e0916c62.json

Quick Start

Requirements: Python 3.11+, a GGUF-capable LLM backend (llama-cpp-python), sentence-transformers. Full setup in sis_test_harness/README.md.

# Clone
git clone https://github.com/entelyx/speculative-intent-system
cd speculative-intent-system/sis_test_harness

# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Smoke test (no models needed — uses mocks)
python -m src.runner smoke --mock
python -m src.runner smoke-exercise

# Full battery (requires Llama 3.1 8B GGUF at models/)
python -m src.runner battery --seed 42 --variant embedding --modality text-paste

What's Proven vs What's Next

Proven (locked datasets)

  • Tier-1 median latency: 8.924ms on CPU (text-paste, consumer hardware)
  • Mechanism correctness across three input modalities (text-typed, text-paste, audio)
  • Safety invariant — 0% hard-band false-positive rate across four consecutive runs
  • Consumer GPU (RTX 3070) reduces audio T1 latency by 16–21× vs CPU baseline
  • Modality-agnostic architecture — same orchestrator code, different streamers

In Progress

  • Jetson Orin Nano battery — validates the same harness on constrained edge hardware (8GB unified memory). Pending model swap (Llama 3.1 8B → Llama 3.2 3B) to resolve OOM at the Tier-2 layer.
  • RTX 3070 CUDA full battery — Tier-1 GPU acceleration validation. Expected to further reduce text-mode latency below the CPU baseline.
  • Tier-1 LoRA distillation — replacing the embedding similarity classifier with a fine-tuned small model (SmolLM2-360M or Qwen2.5-0.5B base). First training run on DGX Spark GB10.

Roadmap

  • SIS SDK — a licensable API surface for third-party edge AI developers. Spec in progress.
  • CascGang embodiment — SIS as the command intent layer for robotic swarm orchestration.
  • HoloAssist embodiment — SIS driving a local-LLM holographic assistant interface.

Hardware Validation Plan

Tier Hardware ASR Tier-2 Status
0 RTX 3070 (CPU-only) pywhispercpp CPU Llama 3.1 8B CPU ✅ Locked
1 RTX 3070 (CUDA) faster-whisper CUDA Llama 3.1 8B CPU ✅ Locked
2 Jetson Orin Nano 8GB faster-whisper CPU Llama 3.2 3B (pending OOM fix) 🔄 In progress
3 DGX Spark GB10 faster-whisper CUDA Llama 3.1 70B or LoRA ⚪ Planned

Reconciler State Machine

The reconciler classifies every trial into one of eight internal outcomes:

Outcome Tier-1 fires? Tier-2 agrees? UI State Action
Match ✅ same intent + slots MATCH Stand
Refinement ✅ same intent, adds/removes slots REFINEMENT Update in place
Contraction ✅ same intent, removes slots REFINEMENT Update in place
Slot Conflict ✅ same intent, changes slot value REFINEMENT Update to T2 value
Mismatch ❌ different intent MISMATCH Roll back, execute T2
Tier-2 Abstain — abstained MISMATCH Roll back
Confusion malformed output CONFUSION Surface to user
Timeout — timed out MISMATCH Treat as abstain

Citation

If you use this harness or reference this architecture in your work:

Speculative Intent System — Validation Harness
Entelyx LLC, 2026
USPTO Provisional Patent Application on file
https://github.com/entelyx/speculative-intent-system

License

The test harness source code is released under the MIT License.

The patent anchor datasets (patent_anchor_data/) are released as evidence of reduction-to-practice and are provided for reference and reproducibility verification. They may not be used to challenge the novelty or priority of the associated provisional patent application.


Built by Entelyx LLC — physical AI, edge inference, and embedded intent systems.

About

Two-tier speculative intent architecture — fires on partial input, reconciles with LLM in parallel. Patent pending.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages