Speculative Intent System (SIS)

Patent Pending — USPTO Provisional Application on file. See PATENT_PENDING.md.

SIS is a two-tier architecture that fires a speculative intent on partial input — before the user finishes speaking or typing — and reconciles it against a slower, authoritative model in parallel. The result: time-to-first-action drops by 239× on text input while maintaining zero false-positive fires on ambiguous inputs across four consecutive validation runs.

This repository contains the open-source test harness used to validate the architecture across three input modalities (text-paste, text-typed, audio) and three hardware tiers (CPU, RTX 3070 CUDA, Jetson Orin Nano).

Architecture

flowchart LR
    subgraph Input["Input Stream (partial)"]
        A[Typed / Spoken / Pasted]
    end

    subgraph Tier1["Tier-1 — Fast Classifier"]
        B[Sentence-transformer\nMiniLM-L6-v2\nMedian: 8.9ms]
    end

    subgraph Tier2["Tier-2 — Authoritative Model"]
        C[LLM\nLlama 3.1 8B / Qwen\nMedian: ~2s]
    end

    subgraph Reconciler["Reconciler — 8 Internal Outcomes"]
        D{Match?}
    end

    subgraph UIState["4 User-Visible States"]
        E[MATCH — action stands]
        F[REFINEMENT — action updated]
        G[MISMATCH — action rolled back]
        H[CONFUSION — surfaces to user]
    end

    A -->|partial stream| B
    A -->|complete input| C
    B -->|speculative fire| D
    C -->|authoritative prediction| D
    D --> E
    D --> F
    D --> G
    D --> H

How It Works

Tier-1 watches the partial input stream. When it recognizes an intent with sufficient confidence, it fires a speculative action immediately — in ~9ms on text, ~2.6s on audio with CUDA ASR.
Tier-2 runs concurrently on the same input. By the time it finishes, the user may have already seen their action begin executing.
The Reconciler classifies the relationship between Tier-1 and Tier-2's predictions across eight internal outcomes, and surfaces one of four UI states. This is the inventive core: UI states mask LLM latency as character expression rather than a loading spinner.
Rollback is available for mismatched speculative actions. The architecture requires that Tier-1 only fires on rollback-feasible action classes in the current world state — preserving safety without sacrificing speed.

The architecture is modality-agnostic: the same orchestrator code drove text-typed, text-paste, audio-CPU, and audio-CUDA batteries with zero behavioral branching. Only the input streamer changes.

Validation Results

All thresholds were locked before any data was collected (see sis_test_harness/config/thresholds.json, timestamped 2026-05-21). This is the anti-bias provision: thresholds cannot be loosened after seeing data.

Text-Paste Battery — Primary T1 Latency Claim

Run ID 96a2827e | 600 trials | Hardware: RTX 3070 (CPU-only Tier-2)

Criterion	Threshold	Result	Status
Easy-band Tier-1 accuracy	≥ 85%	100.0% (120/120)	✅
Medium-band Tier-1 accuracy	≥ 70%	100.0% (90/90)	✅
Hard-band Tier-1 accuracy	≥ 50%	66.7% (58/87)	✅
Hard-band false-positive rate	< 25%	0.00% (0/29)	✅
Tier-1 median latency	≤ 200ms	8.924 ms	✅

Headline: Median time-to-first-action drops from 2,128ms (baseline) to 8.924ms (SIS) — 239× faster, with zero false fires on ambiguous input.

Text-paste mode is where the T1 latency claim lives: input emits as a single partial, so t1_ms reflects pure classifier compute with no streamer-pacing artifact.

Audio Battery — CUDA Hardware (Consumer GPU)

Run ID e0916c62 | 600 trials | Hardware: RTX 3070 (faster-whisper small.en CUDA + CPU Tier-2)

Criterion	Threshold	Result	Status
Medium-band Tier-1 accuracy	≥ 70%	100.0% (90/90)	✅
Hard-band Tier-1 accuracy (behavioral)	≥ 50%	66.7% (58/87)	✅
Hard-band false-positive rate	< 25%	0.00% (0/29)	✅
Tier-1 latency vs CPU baseline	—	16–21× speedup	✅

Behavioral accuracy at parity with text-paste GREEN run. The 200ms T1 latency threshold does not apply to audio mode — wall-clock latency is bounded by utterance duration, not classifier compute. Full analysis in patent_anchor_data/README.md.

Safety Claim — Four Consecutive Runs

Zero hard-band false-positive fires across:

text-typed CPU (948c4160)
text-paste CPU (96a2827e)
audio CPU (624194eb)
audio CUDA (e0916c62)

The safety invariant holds across two input modalities and three hardware configurations.

Repository Structure

sis-public-repo/
├── README.md                    ← you are here
├── PATENT_PENDING.md
├── sis_test_harness/            ← full benchmark harness (see below)
│   ├── src/                     ← orchestrator, reconciler, Tier-1, Tier-2, streamers
│   ├── config/                  ← locked thresholds + corpus (pre-data-collection)
│   ├── audio/                   ← corpus WAV generation
│   ├── analysis/                ← stats, verdict, plotting scripts
│   ├── tests/                   ← pytest suite
│   ├── docs/                    ← architecture decisions, reconciler state diagram
│   └── results/                 ← CSV output (gitignored; patent_anchor_data is the archive)
└── patent_anchor_data/          ← locked, immutable validation datasets
    ├── README.md                ← full per-dataset analysis
    ├── 2026-05-22_text-paste_battery_96a2827e.csv
    ├── 2026-05-22_text-paste_stats_96a2827e.json
    ├── 2026-05-22_text-typed_battery_948c4160.csv
    ├── 2026-05-22_text-typed_stats_948c4160.json
    ├── 2026-05-23_audio_battery_624194eb.csv
    ├── 2026-05-23_audio_stats_624194eb.json
    ├── 2026-05-23_audio-rtx3070-cuda_battery_e0916c62.csv
    └── 2026-05-23_audio-rtx3070-cuda_stats_e0916c62.json

Quick Start

Requirements: Python 3.11+, a GGUF-capable LLM backend (llama-cpp-python), sentence-transformers. Full setup in sis_test_harness/README.md.

# Clone
git clone https://github.com/entelyx/speculative-intent-system
cd speculative-intent-system/sis_test_harness

# Install
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Smoke test (no models needed — uses mocks)
python -m src.runner smoke --mock
python -m src.runner smoke-exercise

# Full battery (requires Llama 3.1 8B GGUF at models/)
python -m src.runner battery --seed 42 --variant embedding --modality text-paste

What's Proven vs What's Next

Proven (locked datasets)

Tier-1 median latency: 8.924ms on CPU (text-paste, consumer hardware)
Mechanism correctness across three input modalities (text-typed, text-paste, audio)
Safety invariant — 0% hard-band false-positive rate across four consecutive runs
Consumer GPU (RTX 3070) reduces audio T1 latency by 16–21× vs CPU baseline
Modality-agnostic architecture — same orchestrator code, different streamers

In Progress

Jetson Orin Nano battery — validates the same harness on constrained edge hardware (8GB unified memory). Pending model swap (Llama 3.1 8B → Llama 3.2 3B) to resolve OOM at the Tier-2 layer.
RTX 3070 CUDA full battery — Tier-1 GPU acceleration validation. Expected to further reduce text-mode latency below the CPU baseline.
Tier-1 LoRA distillation — replacing the embedding similarity classifier with a fine-tuned small model (SmolLM2-360M or Qwen2.5-0.5B base). First training run on DGX Spark GB10.

Roadmap

SIS SDK — a licensable API surface for third-party edge AI developers. Spec in progress.
CascGang embodiment — SIS as the command intent layer for robotic swarm orchestration.
HoloAssist embodiment — SIS driving a local-LLM holographic assistant interface.

Hardware Validation Plan

Tier	Hardware	ASR	Tier-2	Status
0	RTX 3070 (CPU-only)	pywhispercpp CPU	Llama 3.1 8B CPU	✅ Locked
1	RTX 3070 (CUDA)	faster-whisper CUDA	Llama 3.1 8B CPU	✅ Locked
2	Jetson Orin Nano 8GB	faster-whisper CPU	Llama 3.2 3B (pending OOM fix)	🔄 In progress
3	DGX Spark GB10	faster-whisper CUDA	Llama 3.1 70B or LoRA	⚪ Planned

Reconciler State Machine

The reconciler classifies every trial into one of eight internal outcomes:

Outcome	Tier-1 fires?	Tier-2 agrees?	UI State	Action
Match	✅	✅ same intent + slots	MATCH	Stand
Refinement	✅	✅ same intent, adds/removes slots	REFINEMENT	Update in place
Contraction	✅	✅ same intent, removes slots	REFINEMENT	Update in place
Slot Conflict	✅	✅ same intent, changes slot value	REFINEMENT	Update to T2 value
Mismatch	✅	❌ different intent	MISMATCH	Roll back, execute T2
Tier-2 Abstain	✅	— abstained	MISMATCH	Roll back
Confusion	✅	malformed output	CONFUSION	Surface to user
Timeout	✅	— timed out	MISMATCH	Treat as abstain

Citation

If you use this harness or reference this architecture in your work:

Speculative Intent System — Validation Harness
Entelyx LLC, 2026
USPTO Provisional Patent Application on file
https://github.com/entelyx/speculative-intent-system

License

The test harness source code is released under the MIT License.

The patent anchor datasets (patent_anchor_data/) are released as evidence of reduction-to-practice and are provided for reference and reproducibility verification. They may not be used to challenge the novelty or priority of the associated provisional patent application.

Built by Entelyx LLC — physical AI, edge inference, and embedded intent systems.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
patent_anchor_data		patent_anchor_data
sis_test_harness		sis_test_harness
.gitignore		.gitignore
LICENSE		LICENSE
PATENT_PENDING.md		PATENT_PENDING.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speculative Intent System (SIS)

Architecture

How It Works

Validation Results

Text-Paste Battery — Primary T1 Latency Claim

Audio Battery — CUDA Hardware (Consumer GPU)

Safety Claim — Four Consecutive Runs

Repository Structure

Quick Start

What's Proven vs What's Next

Proven (locked datasets)

In Progress

Roadmap

Hardware Validation Plan

Reconciler State Machine

Citation

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Speculative Intent System (SIS)

Architecture

How It Works

Validation Results

Text-Paste Battery — Primary T1 Latency Claim

Audio Battery — CUDA Hardware (Consumer GPU)

Safety Claim — Four Consecutive Runs

Repository Structure

Quick Start

What's Proven vs What's Next

Proven (locked datasets)

In Progress

Roadmap

Hardware Validation Plan

Reconciler State Machine

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages