Skip to content

iota31/tribe

GPL-3.0 License Python 3.11+ Apple Silicon Beta CI

🧠 Tribe - Neural Content Analysis

Run Meta's TRIBE v2 brain encoding model locally - text, audio, and video.

Tribe predicts how human brains respond to any content - using the same neuroscience approach as Meta's research, running locally via a Rust inference engine with Metal GPU acceleration or CPU fallback.

Contents

Demo

TRIBE v2 - brain network activation analysis


CLI Output

# Analyze text
$ tribe analyze article.txt
ℹ  This content tends toward a Fear tone. Low manipulation signal.
   Manipulation score: 1.2/10
   Primary emotion targeted: Fear
   Neural analysis: salience network activates 0.6x above executive control network
   Backend: tribe_v2_rust | Time: ~28s

# Analyze audio (podcast, speech, etc.)
$ tribe analyze podcast_segment.wav
⚠  This content is designed to trigger a Fear response.
   Manipulation score: 5.8/10
   Backend: tribe_v2_rust | Time: ~45s

# Analyze video (news clips, ads, etc.)
$ tribe analyze news_clip.mp4
⚠  This content is designed to trigger an Outrage response.
   Manipulation score: 6.2/10
   Backend: tribe_v2_rust | Time: ~90s

Why This Exists

TRIBE v2 (released March 2026) is a tri-modal foundation model that predicts fMRI brain responses to text, audio, and video. It was trained on 451 hours of fMRI data from subjects watching movies and listening to podcasts - genuine cutting-edge computational neuroscience.

The problem: Meta's official release requires license approval for their LLaMA-based weights. We use eugenehp/tribev2 (public weights, same model) + a Rust inference engine that supports all three modalities.

TRIBE v2 runs on any machine - Metal GPU for speed, CPU as fallback. Text, audio, and video.

Quick Start

# Install
pip install -e .

# Analyze text
tribe analyze article.txt

# Start the web demo (opens in browser)
tribe serve

Hardware Requirements

Component Requirement
GPU (optional) MacBook M1/M2/M3 for Metal acceleration
RAM 16GB recommended
Storage 3GB for models

No MacBook? TRIBE v2 runs on CPU too - just build the Rust binary without Metal features. Same model, same results, ~2-5 min.

Features

Tri-Modal Brain Encoding

Modality Input Processing Pipeline
Text Articles, posts, transcripts LLaMA 3.2 -> fusion transformer -> fMRI prediction
Audio Podcasts, speeches, news Wav2Vec-BERT -> fusion transformer -> fMRI prediction
Video News clips, ads, films V-JEPA2 + audio -> fusion transformer -> fMRI prediction
Hardware Speed (text) Speed (audio) Speed (video)
Metal GPU ~25s ~45s ~90s
CPU ~2-5 min ~5-8 min ~10-15 min

Predicts brain activation across 20,484 cortical vertices, mapped to Yeo's 7 functional networks. The model was trained on 451 hours of fMRI data - audio and video produce the strongest brain encoding signal.

Demo Server

tribe serve --port 8000

Opens a beautiful browser interface with:

  • One-click example buttons (Fear appeal, Neutral, Outrage)
  • Real-time brain network visualization (Yeo 2011 7 networks)
  • Real-time analysis with brain network visualization
  • Keyboard shortcut: Ctrl+Enter to analyze

CLI Commands

tribe analyze article.txt         # Analyze text
tribe analyze podcast.wav         # Analyze audio
tribe analyze news_clip.mp4       # Analyze video
tribe analyze https://example.com # Analyze URL content
tribe analyze --verbose           # Full neural breakdown
tribe analyze --json              # JSON output
tribe serve                       # Start demo server
tribe backends                    # Show hardware + backend status
tribe bench run                   # Run benchmark suite
tribe version                     # Version info

Benchmarks

Brain encoding detects manipulation when you read the predictions through the right neuroscience lens.

Results (25 controlled pairs, text via LLaMA)

Interpretation Layer Win Rate p-value Mean Diff
Yeo 7-network emotional/rational ratio (v1) 40% 0.41 0.11
Region-level persuasion analysis (v2) 84% 0.0004 0.86

The v1 approach used a naive "emotional vs rational" network ratio. It failed because the neuroscience was wrong - Default Mode Network is self-referential (not emotional), and Salience detects importance (not manipulation).

The v2 approach uses region-level analysis based on Falk et al. (2010, 2024):

  • vmPFC (value adoption) - is the person internalizing the message?
  • dlPFC (critical evaluation) - is the person counterarguing?
  • TPJ (motive analysis) - is the person questioning the source?

High vmPFC + low dlPFC + low TPJ = persuasion signal.

All Benchmark Results

# Dataset Type Interpretation Result Status
001 25 Controlled Pairs Text Yeo 7-network ratio (v1) 40% win, p=0.41 Failed
002 25 Controlled Pairs TTS Audio Yeo 7-network ratio (v1) 44% win, p=0.41 Failed
003 25 Controlled Pairs Text Region persuasion (v2) 84% win, p=0.0004 Success
004 25 Controlled Pairs TTS Audio Region persuasion (v2) 28% win, p=0.01 Inverted
005 SpeechMentalManip Real Audio Region persuasion (v2) Inverted (10 files) Inverted
006 Pure Tones (16 freqs) Audio Region persuasion (v2) 3 unique patterns Text-dominated
007 MentalManip (n=2,915) Text Region persuasion (v2) AUC 0.469 Wrong dataset
008 SemEval-2020 (n=371) Text Region persuasion (v2) Running... TBD

Key Findings

  1. Region-level persuasion analysis works on engineered media manipulation (84%, p=0.0004 on controlled pairs)
  2. MentalManip (movie dialogues) fails - interpersonal manipulation is a different cognitive process than media manipulation
  3. Audio pipeline is text-dominated - Whisper transcription -> LLaMA features override Wav2Vec-BERT acoustic features
  4. Pure tones produce only 3 brain patterns across 16 frequencies - confirming text dominance in fusion

Datasets

Dataset Type Size Content Type
Controlled Pairs Text 25 pairs Engineered media manipulation
SemEval-2020 Text 371 articles Propaganda news (18 techniques)
MentalManip Text 2,915 dialogues Movie dialogue manipulation
SpeechMentalManip Audio 699 files TTS-rendered dialogues
NELA-GT-2022 Text 1.78M articles News reliability (MBFC labels)
# Reproduce benchmarks
pip install -e ".[bench]"
tribe bench run

See BENCHMARKS.md for full methodology. See results/benchmarks/ for all recorded results.

The Story

We started by asking a simple question: can a brain encoding model detect content manipulation?

Meta's TRIBE v2 predicts how the human brain responds to content - 20,484 cortical vertices of fMRI activation, mapped to 7 functional networks. The hypothesis: manipulative content should activate emotional brain networks (Salience, Limbic, Default Mode) more than rational ones (Executive Control, Dorsal Attention).

First attempt: text-only. We ran manipulative vs. neutral articles through TRIBE v2 using LLaMA text embeddings. The results were barely better than a coin flip - 52% win rate, p=0.70. Disappointing.

The breakthrough came from a simple observation: TRIBE v2 is a tri-modal model trained on 451 hours of fMRI from people watching movies and listening to podcasts. Text was the weakest input path. When we routed the same text through TTS -> audio -> Wav2Vec-BERT (the audio encoder the model was actually trained with), the win rate jumped to 80%.

The lesson: brain encoding isn't magic - it's a measurement instrument. Like any instrument, you get the best readings when you measure in the modality the instrument was designed for. TRIBE v2 was designed for audio and video. That's where the signal lives.

Then we questioned our own architecture. The full 25-pair audio test came back at 44% - barely above random, just like text. The 5-pair "80%" was a small-sample fluke. Both pipelines failed. The problem wasn't the modality. It was the interpretation layer.

We dug into the neuroscience. Two deep research dives into the persuasion fMRI literature (Falk et al. 2010, 2024 PNAS) revealed three fatal flaws in our approach:

  1. The Default Mode Network is NOT emotional - it's self-referential. People who resist persuasion show more DMN activation (they're counterarguing).
  2. The Salience Network detects importance, not manipulation. Breaking news and propaganda both light it up.
  3. vmPFC and dlPFC are both in "Executive Control" but do opposite things - vmPFC goes UP during persuasion (value adoption), dlPFC goes DOWN (critical evaluation suppressed). Averaging them together cancels the signal.

The fix: Replace the 7-network ratio with region-level persuasion analysis. Track the specific brain regions the literature says matter:

  • vmPFC (medialorbitofrontal) - is the person adopting the message?
  • dlPFC (middle frontal) - is the person still thinking critically?
  • TPJ (supramarginal) - is the person questioning the source's motives?

The result: 84% win rate, p=0.0004. The same TRIBE v2 predictions, read through the right neuroscience lens, produce a statistically significant separation between manipulative and non-manipulative content.

The fMRI predictions were always fine. We were just reading them wrong.

Then we tested at scale. We ran 2,915 movie dialogues from MentalManip (ACL 2024) through the new v2 interpretation. AUC: 0.469 - worse than random. The model couldn't tell manipulative movie dialogue from normal conversation.

But that's actually the right answer. MentalManip contains interpersonal manipulation - gaslighting, guilt-tripping, subtle social dynamics between characters. TRIBE v2 was trained on fMRI from people consuming media, not navigating social relationships. It detects engineered media manipulation (propaganda, fear appeals, loaded language in articles) but not conversational manipulation. These are different cognitive processes.

We also tested pure audio. 16 pure sine tones (20Hz to 12,000Hz) through the audio pipeline. Result: only 3 unique brain activation patterns across all frequencies. The audio pipeline is text-dominated - Whisper transcribes the audio, LLaMA encodes the transcription, and those text features overwhelm the Wav2Vec-BERT acoustic features in the fusion transformer. The raw acoustic signal doesn't drive the prediction.

Where we are now: SemEval-2020 (371 real propaganda news articles) is running. This is the right dataset - engineered media manipulation with expert-labeled propaganda techniques. If the region-level persuasion analysis separates high-propaganda from low-propaganda news articles, we have a validated system. Results coming soon.

What we've learned:

  1. Brain encoding predictions are rich - 20,484 vertices of information
  2. The interpretation layer matters more than the model
  3. Neuroscience-informed region analysis (vmPFC/dlPFC/TPJ) beats naive network ratios
  4. The model detects engineered media manipulation, not interpersonal dialogue manipulation
  5. The audio pipeline is text-mediated - acoustic features are secondary to transcription

Then we tested Path C - a learned classifier. Instead of hand-tuning region weights, we trained a logistic regression on the raw 20,484-vertex activations using PCA. Three experiments:

Experiment Result What It Means
Train on paired, test on paired (5-fold CV) AUC 0.912 Signal exists in paired data
Train on paired, test on SemEval AUC 0.41 (inverted) Signal doesn't transfer across content types
Train AND test within SemEval (quartile split) AUC 0.670, rho=0.35, p<1e-6 Signal exists in SemEval too - but it's different

The finding: The brain encoding model's raw activations DO contain propaganda-discriminating information. Both our paired texts AND SemEval articles have learnable signal. But the signals are content-specific - a classifier trained on extreme fear appeals doesn't detect subtle news propaganda.

This reframes the whole project. TRIBE isn't "a manipulation detector" - it's a brain-response-based feature extractor. The 20,484-vertex activations are rich features that separate content types when a classifier is trained on the right target domain.

The path forward: Train classifiers on large-scale target-specific datasets (NELA-GT: 1.78M articles with reliability labels). The hand-tuned formula was a dead end. The learned approach has legs.

Installation

1. Install Tribe

git clone https://github.com/iota31/tribe.git
cd tribe
pip install -e .

2. Build the Rust Binary

Required for analysis. Choose GPU or CPU build:

# Install Rust (one-time)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

# Build tribev2-infer with Metal GPU support
git clone https://github.com/eugenehp/tribev2-rs /tmp/tribev2-rs
cd /tmp/tribev2-rs
cargo build --release --bin tribev2-infer --features "default,llama-metal"

# Or for CPU-only (any machine, no Metal GPU required):
cargo build --release --bin tribev2-infer --features default

3. Download LLaMA 3.2 3B (if using Ollama)

ollama pull llama3.2

That's it. Run tribe backends to verify:

Tribe - Backend Status
────────────────────────────────────────

Hardware:
  GPU: Apple Silicon (MPS) ✓

TRIBE v2 Rust:
  Status: ✓ available

How It Works

Tribe pipeline architecture

Brain Network Analysis

The TRIBE v2 Rust backend maps predicted brain activation to Yeo's 7 functional networks. Emotional networks (Salience, Default Mode, Limbic) indicate manipulation signals. Rational networks (Executive Control, Dorsal Attention) indicate analytical processing.

The manipulation ratio = emotional / rational activation. Ratio > 1.0 means emotional networks dominate. Ratio < 1.0 means rational networks win.

Yeo 2011 brain network activation

Architecture

See the full pipeline diagram above. For the complete design philosophy, see DESIGN.md. Key files:

File Purpose
tribe/cli.py Click CLI (analyze, serve, backends)
tribe/server.py FastAPI demo server
tribe/analyze.py Main orchestrator
tribe/backends/router.py Hardware detection + backend selection
tribe/backends/tribe_v2_rust.py TRIBE v2 via tribev2-rs (Metal GPU / CPU)
tribe/interpretation/neural.py Yeo 7-network mapping

Models Used

Model Modality Purpose Size
eugenehp/tribev2 All Fusion transformer -> fMRI predictor 676MB
LLaMA 3.2 3B GGUF Text Text feature extraction 1.9GB
Wav2Vec-BERT 2.0 Audio Audio feature extraction Built into tribev2-infer
V-JEPA2 Video Video feature extraction Built into tribev2-infer
Yeo 2011 7-Network - Brain atlas parcellation 164KB

License

Tribe package: GPL-3.0 - Copyright 2026 Tushar

TRIBE v2 components: CC-BY-NC-4.0 - by Meta AI, non-commercial research use only

Tribe is open-source. TRIBE v2 model weights are non-commercial. See LICENSE-TRIBE-V2 for details.

Roadmap

  • Run TRIBE v2 on MacBook M-series (Metal GPU)
  • CPU fallback - same model, any machine
  • Tri-modal support - text, audio, video
  • Rust inference engine via tribev2-rs
  • Web demo server
  • Benchmark suite against ACL/SemEval datasets
  • GitHub Actions CI + community health files
  • LLM-powered explanation of brain activation patterns
  • Local content history / media diet tracker
  • RSS feed batch analysis
  • Video benchmark with curated YouTube clips

Contributing

See CONTRIBUTING.md for development setup, code style, and how to submit changes.

Security

To report a vulnerability, see SECURITY.md. Please do not open public issues for security bugs.

Acknowledgments

About

No description, website, or topics provided.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors