Skip to content

COCOP1l0t/CodeAuditor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

157 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ‡ΊπŸ‡Έ English | δΈ­ζ–‡

CodeAuditor

A multi-stage, agentic code auditing pipeline that can run on the Claude Code SDK or the Codex App Server Python SDK. Given a target source tree, CodeAuditor researches project context, decomposes the codebase into analysis units, hunts for bugs, evaluates them as security vulnerabilities, reproduces them with a working PoC, and finally prepares a disclosure-ready report package.

CodeAuditor has discovered several CVEs in widely used open-source projects β€” see Vulnerabilities found below.

TUI Dashboard

How it works

The audit runs as seven sequential stages. Each stage is driven by a prompt template in prompts/ and executed by one or more backend agents. Outputs are validated, and on validation failure a repair prompt is sent (up to max_retries). Intermediate artifacts are written under the output directory; a .markers/ folder tracks completed sub-tasks so runs can be resumed.

Stage What it does Parallelism
0 Git pull + create output directories None
1 Security context research (git history, web, SECURITY.md) Single agent
2 Decompose the project into analysis units (AUs) Single agent
3 Bug discovery per analysis unit 1 agent per AU
4 Evaluate findings: real vulnerability? severity? 1 agent per finding
5 PoC reproduction: build, exploit, capture evidence 1 agent per vulnerability
6 Disclosure: technical report, email, minimal PoC, zipped package 1 agent per vulnerability

Stage 1 produces two directives β€” an auditing focus and vulnerability criteria β€” that are injected into later stages so the whole pipeline stays aligned with the project's actual threat model.

System Design

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Target Repo β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       β”‚
       β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 0    β”‚     β”‚      DIRECTIVE INJECTION    β”‚
β”‚    Init     │────►│  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚  β”‚Auditing β”‚  β”‚Vuln     β”‚   β”‚
       β”‚            β”‚  β”‚ Focus   β”‚  β”‚Criteria β”‚   β”‚
       β–Ό            β”‚  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”˜   β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚       β”‚            β”‚        β”‚
β”‚  Stage 1    │────►│       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜        β”‚
β”‚   Context   β”‚     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                    β”‚
       β”‚                           β”‚
       β–Ό                           β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Stage 2    │──►│  Stage 3    │──►│  Stage 4    │──►│  Stage 5    │──►│  Stage 6    β”‚
β”‚  Decompose  β”‚   β”‚   Discover  β”‚   β”‚   Evaluate  β”‚   β”‚     PoC     β”‚   β”‚   Disclose  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
                                                                               β”‚
                                                                               β–Ό
                                                                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                                                                        β”‚  Disclosure β”‚
                                                                        β”‚   Package   β”‚
                                                                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Requirements

  • Python 3.12+
  • A working Claude Code install for --backend claude (the SDK reuses its authentication)
  • A working Codex CLI at /usr/local/bin/codex with codex app-server support and local Codex auth/session for --backend codex
  • Git, plus whatever build tools the target project needs for Stage 5 reproduction

Installation

git clone https://github.com/COCOP1l0t/CodeAuditor.git
cd CodeAuditor
pip install -e .

This exposes the code-auditor CLI entry point.

Usage

code-auditor --target /path/to/project [options]

Common options

Flag Description
--target Required. Root directory of the project to audit.
--output-dir Output directory (default: {target}/audit-output-YYYYMMDD, using the current local date).
--discovered Reproduced bugs HTML file used by Stage 6 (default: {target}/reproduced-bugs.html). Pass a path to override where this cross-run record is read and updated.
--wiki LLM wiki knowledge base directory. CodeAuditor treats it as read-only and gives agents stage-specific wiki search guidance.
--max-parallel Max concurrent agents (default: 1).
--backend Agent backend: claude or codex (default: claude).
--model Backend model override. Claude defaults to claude-sonnet-4-6; Codex uses the local Codex config default unless specified.
--target-au-count Target number of analysis units for Stage 2 (default: 10).
--log-level DEBUG | INFO | WARNING | ERROR (default: INFO).
--tui Launch the interactive TUI dashboard instead of plain log output.

Bold options are required.

Agent runs use a 20-minute semantic timeout cycle by default. If an agent is still running after 20 minutes, CodeAuditor starts a status-checking subagent to inspect that agent's agent.log; when the checker determines the analysis is already finished, CodeAuditor kills the original backend process. Otherwise, it waits another 20 minutes and repeats the check.

By default, Stage 6 creates or updates {target}/reproduced-bugs.html. Before generating disclosures, Stage 6 reads this file and skips reproduced bugs with matching dedupe metadata. After Stage 6 successfully writes disclosure output for a new reproduced bug, it appends a new HTML entry to the same file. Use --discovered /path/to/reproduced-bugs.html to read and update a different HTML file.

The HTML record uses one collapsible section per reproduced bug. Each section carries a visible review status tag and matching machine-readable status fields for unreviewed, reported, confirmed, rejected, or duplicated.

Runs resume from checkpoint markers automatically β€” delete the output directory (or its .markers/ subdirectory) to start a fresh audit.

Wiki knowledge base

--wiki /path/to/wiki lets CodeAuditor use an existing LLM wiki knowledge base during the audit. CodeAuditor treats the wiki as read-only and instructs agents not to create, edit, or update wiki files. Enforce filesystem permissions externally if write prevention is required.

Recommended structure:

wiki/
|-- index.md
|-- overview.md
|-- attack-surface.md
|-- auditing-guide.md
|-- exploit-patterns.md
|-- reproduction-workflow.md
|-- vulnerability-timeline.md
|-- entities/
|   `-- <component>.md
|-- concepts/
|   `-- <vulnerability-class>.md
`-- sources/
    `-- <cve-or-case-study>.md

index.md is recommended as the navigation entry point. Partial wikis are supported; stages skip absent files and use the pages that exist.

A real-world example is the QEMU-Security-Wiki β€” a community-maintained knowledge base for auditing QEMU.

Example

code-auditor \
  --target ~/projects/libfoo \
  --output-dir ~/audits/libfoo \
  --wiki ~/knowledge/libfoo-wiki \
  --max-parallel 4 \
  --tui \
  --log-level DEBUG

Output layout

{output-dir}/
β”œβ”€β”€ stage1-security-context/  # context research + auditing focus + vuln criteria
β”œβ”€β”€ stage2-analysis-units/    # codebase decomposition
β”œβ”€β”€ stage3-findings/          # per-AU bug findings
β”œβ”€β”€ stage4-vulnerabilities/   # evaluated, confirmed vulnerabilities
β”œβ”€β”€ stage5-pocs/              # PoCs + evidence
β”œβ”€β”€ stage6-disclosures/       # disclosure reports, emails, zipped PoCs
└── .markers/          # checkpoint markers for --resume

Stage 6 also creates or updates {target}/reproduced-bugs.html by default. This target-root file is outside {output-dir} unless you point --discovered somewhere else.

Project layout

code_auditor/
β”œβ”€β”€ __main__.py          # CLI entry point
β”œβ”€β”€ config.py            # AuditConfig and dataclasses
β”œβ”€β”€ orchestrator.py      # Sequential stage runner
β”œβ”€β”€ agent.py             # Backend wrappers + validation retry loop
β”œβ”€β”€ prompts.py           # Prompt loader with __KEY__ substitution
β”œβ”€β”€ checkpoint.py        # Marker-based checkpoint/resume
β”œβ”€β”€ logger.py            # Logging helper
β”œβ”€β”€ utils.py             # Parallelism + file helpers
β”œβ”€β”€ stages/              # stage0 – stage6
β”œβ”€β”€ parsing/             # Structured extraction from agent output
β”œβ”€β”€ validation/          # Per-stage output validators
└── tests/
prompts/                 # stage1.md – stage6.md prompt templates

Development

pytest                       # run all tests
pytest code_auditor/tests    # same thing
pytest -k stage2             # filter by name

Tests cover parsers and validators; they do not make real agent calls.

Vulnerabilities Found

Vulnerabilities CodeAuditor has helped discover and disclose:

CVE ID Project Year Reference
CVE-2026-28780 httpd 2026 GitHub
CVE-2026-34032 httpd 2026 GitHub
CVE-2026-40312 ImageMagick 2026 GitHub
CVE-2026-40385 libexif 2026 GitHub
CVE-2026-40386 libexif 2026 GitHub
CVE-2026-7180 QEMU 2026 GitLab
CVE-2026-8341 QEMU 2026 GitLab
CVE-2026-8343 QEMU 2026 GitLab
CVE-2026-8348 QEMU 2026 GitLab
CVE-2026-9238 QEMU 2026 GitLab
CVE-2026-48914 QEMU 2026 GitLab
CVE-2026-48915 QEMU 2026 GitLab
Embargoed GStreamer 2026 #5035
Embargoed GStreamer 2026 #5036
Embargoed GStreamer 2026 #5038
Embargoed GStreamer 2026 #5039

Responsible use

CodeAuditor is intended for auditing code you own or have explicit permission to test, and for coordinated disclosure to upstream maintainers. Do not use it to target systems or projects without authorization.

Important: Before sending any vulnerability report to project maintainers, manually review the generated disclosure materials. Verify that the vulnerability is real, the severity assessment is accurate, and the proof-of-concept actually reproduces the issue. Automated findings may contain false positives or inaccuracies that could waste maintainers' time or damage your credibility.

License

Apache License 2.0 β€” see LICENSE for details.

This software is provided for educational, research, and experimental purposes only. See the disclaimer at the top of the LICENSE file.

About

Automated Vulnerability Discovery Agent Pipeline

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages