🧠 Tribe - Neural Content Analysis

Run Meta's TRIBE v2 brain encoding model locally - text, audio, and video.

Tribe predicts how human brains respond to any content - using the same neuroscience approach as Meta's research, running locally via a Rust inference engine with Metal GPU acceleration or CPU fallback.

Demo

CLI Output

# Analyze text
$ tribe analyze article.txt
ℹ  This content tends toward a Fear tone. Low manipulation signal.
   Manipulation score: 1.2/10
   Primary emotion targeted: Fear
   Neural analysis: salience network activates 0.6x above executive control network
   Backend: tribe_v2_rust | Time: ~28s

# Analyze audio (podcast, speech, etc.)
$ tribe analyze podcast_segment.wav
⚠  This content is designed to trigger a Fear response.
   Manipulation score: 5.8/10
   Backend: tribe_v2_rust | Time: ~45s

# Analyze video (news clips, ads, etc.)
$ tribe analyze news_clip.mp4
⚠  This content is designed to trigger an Outrage response.
   Manipulation score: 6.2/10
   Backend: tribe_v2_rust | Time: ~90s

Why This Exists

TRIBE v2 (released March 2026) is a tri-modal foundation model that predicts fMRI brain responses to text, audio, and video. It was trained on 451 hours of fMRI data from subjects watching movies and listening to podcasts - genuine cutting-edge computational neuroscience.

The problem: Meta's official release requires license approval for their LLaMA-based weights. We use eugenehp/tribev2 (public weights, same model) + a Rust inference engine that supports all three modalities.

TRIBE v2 runs on any machine - Metal GPU for speed, CPU as fallback. Text, audio, and video.

Quick Start

# Install
pip install -e .

# Analyze text
tribe analyze article.txt

# Start the web demo (opens in browser)
tribe serve

Hardware Requirements

Component	Requirement
GPU (optional)	MacBook M1/M2/M3 for Metal acceleration
RAM	16GB recommended
Storage	3GB for models

No MacBook? TRIBE v2 runs on CPU too - just build the Rust binary without Metal features. Same model, same results, ~2-5 min.

Features

Tri-Modal Brain Encoding

Modality	Input	Processing Pipeline
Text	Articles, posts, transcripts	LLaMA 3.2 -> fusion transformer -> fMRI prediction
Audio	Podcasts, speeches, news	Wav2Vec-BERT -> fusion transformer -> fMRI prediction
Video	News clips, ads, films	V-JEPA2 + audio -> fusion transformer -> fMRI prediction

Hardware	Speed (text)	Speed (audio)	Speed (video)
Metal GPU	~25s	~45s	~90s
CPU	~2-5 min	~5-8 min	~10-15 min

Predicts brain activation across 20,484 cortical vertices, mapped to Yeo's 7 functional networks. The model was trained on 451 hours of fMRI data - audio and video produce the strongest brain encoding signal.

Demo Server

tribe serve --port 8000

Opens a beautiful browser interface with:

One-click example buttons (Fear appeal, Neutral, Outrage)
Real-time brain network visualization (Yeo 2011 7 networks)
Real-time analysis with brain network visualization
Keyboard shortcut: Ctrl+Enter to analyze

CLI Commands

tribe analyze article.txt         # Analyze text
tribe analyze podcast.wav         # Analyze audio
tribe analyze news_clip.mp4       # Analyze video
tribe analyze https://example.com # Analyze URL content
tribe analyze --verbose           # Full neural breakdown
tribe analyze --json              # JSON output
tribe serve                       # Start demo server
tribe backends                    # Show hardware + backend status
tribe bench run                   # Run benchmark suite
tribe version                     # Version info

Benchmarks

Brain encoding detects manipulation when you read the predictions through the right neuroscience lens.

Results (25 controlled pairs, text via LLaMA)

Interpretation Layer	Win Rate	p-value	Mean Diff
Yeo 7-network emotional/rational ratio (v1)	40%	0.41	0.11
Region-level persuasion analysis (v2)	84%	0.0004	0.86

The v1 approach used a naive "emotional vs rational" network ratio. It failed because the neuroscience was wrong - Default Mode Network is self-referential (not emotional), and Salience detects importance (not manipulation).

The v2 approach uses region-level analysis based on Falk et al. (2010, 2024):

vmPFC (value adoption) - is the person internalizing the message?
dlPFC (critical evaluation) - is the person counterarguing?
TPJ (motive analysis) - is the person questioning the source?

High vmPFC + low dlPFC + low TPJ = persuasion signal.

All Benchmark Results

#	Dataset	Type	Interpretation	Result	Status
001	25 Controlled Pairs	Text	Yeo 7-network ratio (v1)	40% win, p=0.41	Failed
002	25 Controlled Pairs	TTS Audio	Yeo 7-network ratio (v1)	44% win, p=0.41	Failed
003	25 Controlled Pairs	Text	Region persuasion (v2)	84% win, p=0.0004	Success
004	25 Controlled Pairs	TTS Audio	Region persuasion (v2)	28% win, p=0.01	Inverted
005	SpeechMentalManip	Real Audio	Region persuasion (v2)	Inverted (10 files)	Inverted
006	Pure Tones (16 freqs)	Audio	Region persuasion (v2)	3 unique patterns	Text-dominated
007	MentalManip (n=2,915)	Text	Region persuasion (v2)	AUC 0.469	Wrong dataset
008	SemEval-2020 (n=371)	Text	Region persuasion (v2)	Running...	TBD

Key Findings

Region-level persuasion analysis works on engineered media manipulation (84%, p=0.0004 on controlled pairs)
MentalManip (movie dialogues) fails - interpersonal manipulation is a different cognitive process than media manipulation
Audio pipeline is text-dominated - Whisper transcription -> LLaMA features override Wav2Vec-BERT acoustic features
Pure tones produce only 3 brain patterns across 16 frequencies - confirming text dominance in fusion

Datasets

Dataset	Type	Size	Content Type
Controlled Pairs	Text	25 pairs	Engineered media manipulation
SemEval-2020	Text	371 articles	Propaganda news (18 techniques)
MentalManip	Text	2,915 dialogues	Movie dialogue manipulation
SpeechMentalManip	Audio	699 files	TTS-rendered dialogues
NELA-GT-2022	Text	1.78M articles	News reliability (MBFC labels)

# Reproduce benchmarks
pip install -e ".[bench]"
tribe bench run

See BENCHMARKS.md for full methodology. See results/benchmarks/ for all recorded results.

The Story

We started by asking a simple question: can a brain encoding model detect content manipulation?

Meta's TRIBE v2 predicts how the human brain responds to content - 20,484 cortical vertices of fMRI activation, mapped to 7 functional networks. The hypothesis: manipulative content should activate emotional brain networks (Salience, Limbic, Default Mode) more than rational ones (Executive Control, Dorsal Attention).

First attempt: text-only. We ran manipulative vs. neutral articles through TRIBE v2 using LLaMA text embeddings. The results were barely better than a coin flip - 52% win rate, p=0.70. Disappointing.

The breakthrough came from a simple observation: TRIBE v2 is a tri-modal model trained on 451 hours of fMRI from people watching movies and listening to podcasts. Text was the weakest input path. When we routed the same text through TTS -> audio -> Wav2Vec-BERT (the audio encoder the model was actually trained with), the win rate jumped to 80%.

The lesson: brain encoding isn't magic - it's a measurement instrument. Like any instrument, you get the best readings when you measure in the modality the instrument was designed for. TRIBE v2 was designed for audio and video. That's where the signal lives.

Then we questioned our own architecture. The full 25-pair audio test came back at 44% - barely above random, just like text. The 5-pair "80%" was a small-sample fluke. Both pipelines failed. The problem wasn't the modality. It was the interpretation layer.

We dug into the neuroscience. Two deep research dives into the persuasion fMRI literature (Falk et al. 2010, 2024 PNAS) revealed three fatal flaws in our approach:

The Default Mode Network is NOT emotional - it's self-referential. People who resist persuasion show more DMN activation (they're counterarguing).
The Salience Network detects importance, not manipulation. Breaking news and propaganda both light it up.
vmPFC and dlPFC are both in "Executive Control" but do opposite things - vmPFC goes UP during persuasion (value adoption), dlPFC goes DOWN (critical evaluation suppressed). Averaging them together cancels the signal.

The fix: Replace the 7-network ratio with region-level persuasion analysis. Track the specific brain regions the literature says matter:

vmPFC (medialorbitofrontal) - is the person adopting the message?
dlPFC (middle frontal) - is the person still thinking critically?
TPJ (supramarginal) - is the person questioning the source's motives?

The result: 84% win rate, p=0.0004. The same TRIBE v2 predictions, read through the right neuroscience lens, produce a statistically significant separation between manipulative and non-manipulative content.

The fMRI predictions were always fine. We were just reading them wrong.

Then we tested at scale. We ran 2,915 movie dialogues from MentalManip (ACL 2024) through the new v2 interpretation. AUC: 0.469 - worse than random. The model couldn't tell manipulative movie dialogue from normal conversation.

But that's actually the right answer. MentalManip contains interpersonal manipulation - gaslighting, guilt-tripping, subtle social dynamics between characters. TRIBE v2 was trained on fMRI from people consuming media, not navigating social relationships. It detects engineered media manipulation (propaganda, fear appeals, loaded language in articles) but not conversational manipulation. These are different cognitive processes.

We also tested pure audio. 16 pure sine tones (20Hz to 12,000Hz) through the audio pipeline. Result: only 3 unique brain activation patterns across all frequencies. The audio pipeline is text-dominated - Whisper transcribes the audio, LLaMA encodes the transcription, and those text features overwhelm the Wav2Vec-BERT acoustic features in the fusion transformer. The raw acoustic signal doesn't drive the prediction.

Where we are now: SemEval-2020 (371 real propaganda news articles) is running. This is the right dataset - engineered media manipulation with expert-labeled propaganda techniques. If the region-level persuasion analysis separates high-propaganda from low-propaganda news articles, we have a validated system. Results coming soon.

What we've learned:

Brain encoding predictions are rich - 20,484 vertices of information
The interpretation layer matters more than the model
Neuroscience-informed region analysis (vmPFC/dlPFC/TPJ) beats naive network ratios
The model detects engineered media manipulation, not interpersonal dialogue manipulation
The audio pipeline is text-mediated - acoustic features are secondary to transcription

Then we tested Path C - a learned classifier. Instead of hand-tuning region weights, we trained a logistic regression on the raw 20,484-vertex activations using PCA. Three experiments:

Experiment	Result	What It Means
Train on paired, test on paired (5-fold CV)	AUC 0.912	Signal exists in paired data
Train on paired, test on SemEval	AUC 0.41 (inverted)	Signal doesn't transfer across content types
Train AND test within SemEval (quartile split)	AUC 0.670, rho=0.35, p<1e-6	Signal exists in SemEval too - but it's different

The finding: The brain encoding model's raw activations DO contain propaganda-discriminating information. Both our paired texts AND SemEval articles have learnable signal. But the signals are content-specific - a classifier trained on extreme fear appeals doesn't detect subtle news propaganda.

This reframes the whole project. TRIBE isn't "a manipulation detector" - it's a brain-response-based feature extractor. The 20,484-vertex activations are rich features that separate content types when a classifier is trained on the right target domain.

The path forward: Train classifiers on large-scale target-specific datasets (NELA-GT: 1.78M articles with reliability labels). The hand-tuned formula was a dead end. The learned approach has legs.

Installation

1. Install Tribe

git clone https://github.com/iota31/tribe.git
cd tribe
pip install -e .

2. Build the Rust Binary

Required for analysis. Choose GPU or CPU build:

# Install Rust (one-time)
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y

# Build tribev2-infer with Metal GPU support
git clone https://github.com/eugenehp/tribev2-rs /tmp/tribev2-rs
cd /tmp/tribev2-rs
cargo build --release --bin tribev2-infer --features "default,llama-metal"

# Or for CPU-only (any machine, no Metal GPU required):
cargo build --release --bin tribev2-infer --features default

3. Download LLaMA 3.2 3B (if using Ollama)

ollama pull llama3.2

That's it. Run tribe backends to verify:

Tribe - Backend Status
────────────────────────────────────────

Hardware:
  GPU: Apple Silicon (MPS) ✓

TRIBE v2 Rust:
  Status: ✓ available

How It Works

Brain Network Analysis

The TRIBE v2 Rust backend maps predicted brain activation to Yeo's 7 functional networks. Emotional networks (Salience, Default Mode, Limbic) indicate manipulation signals. Rational networks (Executive Control, Dorsal Attention) indicate analytical processing.

The manipulation ratio = emotional / rational activation. Ratio > 1.0 means emotional networks dominate. Ratio < 1.0 means rational networks win.

Architecture

See the full pipeline diagram above. For the complete design philosophy, see DESIGN.md. Key files:

File	Purpose
`tribe/cli.py`	Click CLI (analyze, serve, backends)
`tribe/server.py`	FastAPI demo server
`tribe/analyze.py`	Main orchestrator
`tribe/backends/router.py`	Hardware detection + backend selection
`tribe/backends/tribe_v2_rust.py`	TRIBE v2 via tribev2-rs (Metal GPU / CPU)
`tribe/interpretation/neural.py`	Yeo 7-network mapping

Models Used

Model	Modality	Purpose	Size
`eugenehp/tribev2`	All	Fusion transformer -> fMRI predictor	676MB
LLaMA 3.2 3B GGUF	Text	Text feature extraction	1.9GB
Wav2Vec-BERT 2.0	Audio	Audio feature extraction	Built into tribev2-infer
V-JEPA2	Video	Video feature extraction	Built into tribev2-infer
Yeo 2011 7-Network	-	Brain atlas parcellation	164KB

License

TRIBE v2 components: CC-BY-NC-4.0 - by Meta AI, non-commercial research use only

Tribe is open-source. TRIBE v2 model weights are non-commercial. See LICENSE-TRIBE-V2 for details.

Roadmap

Contributing

See CONTRIBUTING.md for development setup, code style, and how to submit changes.

Security

To report a vulnerability, see SECURITY.md. Please do not open public issues for security bugs.

Acknowledgments

Meta AI - TRIBE v2 model
eugenehp/tribev2 - Public weights fork
eugenehp/tribev2-rs - Rust inference engine
Yeo et al. 2011 - 7-Network functional parcellation

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
images		images
papers		papers
results/benchmarks		results/benchmarks
tests		tests
tribe		tribe
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DESIGN.md		DESIGN.md
LICENSE		LICENSE
LICENSE-TRIBE-V2		LICENSE-TRIBE-V2
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

🧠 Tribe - Neural Content Analysis

Contents

Demo

CLI Output

Why This Exists

Quick Start

Hardware Requirements

Features

Tri-Modal Brain Encoding

Demo Server

CLI Commands

Benchmarks

Results (25 controlled pairs, text via LLaMA)

All Benchmark Results

Key Findings

Datasets

The Story

Installation

1. Install Tribe

2. Build the Rust Binary

3. Download LLaMA 3.2 3B (if using Ollama)

How It Works

Brain Network Analysis

Architecture

Models Used

License

Roadmap

Contributing

Security

Acknowledgments

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages