This repository implements the experiment:
Detecting Collusive Covert Attacks in LLM-Based Multi-Agent Systems.
The system runs decentralized HotpotQA discussions with benign workers, covert malicious workers, and optional trusted detector agents. Detectors can quarantine suspicious workers through the existing permission manager.
By default, nonzero-m attack conditions anchor one malicious role on the reporter/final-output agent and sample any remaining malicious workers from the rest of the pool. This makes the attack variable measure influence over the final answer rather than only low-impact side chatter. Set ANCHOR_MALICIOUS_REPORTER=false to sample all malicious workers uniformly at random.
For each attack strategy:
suboptimal_fixationreframing_misalignmentfake_injectionexecution_delay
we run seven conditions:
m=0, d=0clean baselinem=1, d=0single attackerm=2, d=0collusive attackersm=1, d=1single attacker plus detectorm=2, d=1collusion plus one detectorm=2, d=2collusion plus two detectorsm=all, d=0all-malicious sanity check
Total:
4 attacks × 7 conditions = 28 total conditions
Defaults:
- 25–30 HotpotQA tasks per condition
- 5 repetitions
- 4 turns per task
- 1000 tokens per response
Install dependencies:
python -m pip install -e .For local testing and CI-equivalent tooling:
python -m pip install -e '.[dev]'Set Azure OpenAI credentials in your shell or .env:
export AZURE_OPENAI_API_KEY="..."
export AZURE_OPENAI_ENDPOINT="https://...openai.azure.com/"
export AZURE_OPENAI_DEPLOYMENT="gpt-4o"
export AZURE_OPENAI_API_VERSION="2024-10-21"Packaged entry points:
python -m covert_collusive_hotpot.run_experiments --help
python -m covert_collusive_hotpot.generate_paper_assetsThe legacy root wrappers remain temporarily available for compatibility:
python parallel_experiment_runner.py --help
python generate_paper_tables_and_figures.pyRun the full local test suite:
PYTHONPATH=src pytest tests -qRun only the packaging and entry-point smoke checks:
PYTHONPATH=src pytest tests/test_package_entrypoints.py -qCheck the two primary experiment entry surfaces:
PYTHONPATH=src python -m covert_collusive_hotpot.run_experiments --help
python parallel_experiment_runner.py --helpThere are also two GitHub Actions workflows:
- default CI in
.github/workflows/ci.yml - manual Azure-backed smoke in
.github/workflows/azure-manual-smoke.yml
See tests/README.md for a more detailed breakdown of the test suite and CI layout.
Run a small validation matrix before the full experiment:
python -m covert_collusive_hotpot.run_experiments --smoke --smoke-tasks 2 --max-concurrent 1 --run-label smokeThe smoke and full experiment commands require the HotpotQA dataset. On the first run, datasets will download the hotpot_qa distractor split from Hugging Face and cache it locally. That means the first run requires internet access unless you already have the dataset cached.
The smoke test checks:
- clean baseline
- all-malicious sanity conditions for every attack
- one detector condition to exercise interrogation and quarantine
Use tmux so the experiment survives SSH disconnects:
tmux new -s collusive_hotpot
python -m covert_collusive_hotpot.run_experiments --max-concurrent 4 --run-label full_collusive_hotpotDetach with:
Ctrl-b d
Reattach with:
tmux attach -t collusive_hotpotThe runner writes:
- aggregate condition rows to
experiment_results.<run_label>.csv - task-level checkpoints to
task_progress.<run_label>.jsonl - permanent failures to
failed_configs.<run_label>.csv
If the process stops, rerun the same command. Completed task checkpoints and aggregate rows are reused automatically.
- functional correctness
- completion failure
- attack success
- latency / turns
- detection recall / rate
- detection precision
- false positive rate
- attribution accuracy
- benign restriction rate
- detection latency