Gitsweeper

Mine and analyse GitHub data — sweep a repo, surface patterns. The v1 focus is pull-request process: how long PRs take to merge, how that latency is distributed, how quickly maintainers first engage, and which closed-without-merge PRs were silently retracted by the submitter versus closed by a maintainer.

Specs live under openspec/. The architecture, stack, and non-trivial design choices are documented in openspec/project.md.

Quick start

# Install and lock dependencies (uv-managed).
uv sync

# Strongly recommended: an authenticated 5000/h budget instead of
# the 60/h unauthenticated one.
export GITHUB_TOKEN=$(gh auth token)   # or any PAT with read:repo

# Fetch the full PR history into the local cache.
uv run gitsweeper fetch nextcloud/app-certificate-requests

# Aggregate analyses against the cache.
uv run gitsweeper throughput nextcloud/app-certificate-requests
uv run gitsweeper first-response nextcloud/app-certificate-requests
uv run gitsweeper classify nextcloud/app-certificate-requests
uv run gitsweeper patterns nextcloud/app-certificate-requests

# One-shot composed markdown report (writes the file to docs/examples/
# by convention; the directory is gitignored so reports never get
# committed by accident).
uv run gitsweeper report nextcloud/app-certificate-requests \
    --refresh \
    --out docs/examples/nextcloud-csr-report-$(date -u +%Y-%m-%d).md

# Per-author slice (case-insensitive on the GitHub login).
uv run gitsweeper report nextcloud/app-certificate-requests \
    --author MWest2020 \
    --out docs/examples/nextcloud-csr-mwest2020-$(date -u +%Y-%m-%d).md

Forges

Gitsweeper acquires data through a forge-provider seam (lib/forge). GitHub (the default), Forgejo/Gitea/Codeberg, and GitLab are supported. A bare owner/repo resolves to GitHub exactly as it always has, so no existing command changes.

Select a forge in one of three ways:

explicitly: --forge forgejo, --forge gitlab (or --forge github);
by host: a codeberg.org URL is detected as Forgejo, a gitlab.com URL as GitLab, automatically;
self-hosted: set GITSWEEPER_FORGEJO_URL / GITSWEEPER_GITLAB_URL to your instance's base URL (e.g. https://git.example.org) and a URL on that host is detected accordingly.

export GITHUB_TOKEN=$(gh auth token)        # GitHub: read:repo PAT
export FORGEJO_TOKEN=...                     # Forgejo/Codeberg/Gitea token
export GITSWEEPER_FORGEJO_URL=https://...    # self-hosted base URL (defaults to Codeberg)
export GITLAB_TOKEN=...                      # GitLab personal access token
export GITSWEEPER_GITLAB_URL=https://...     # self-hosted base URL (defaults to gitlab.com)

gitsweeper fetch --forge forgejo forgejo/forgejo
gitsweeper fetch --forge gitlab gitlab-org/gitlab-runner

Forgejo and GitLab reads can run unauthenticated against public repositories on instances that permit it, with a warn-once notice; set FORGEJO_TOKEN / GITLAB_TOKEN for the full rate-limit budget (GitLab also requires a token for some sub-resources, such as a merge request's notes and state events, even on public projects).

Where things live on disk

Path	Status	Contents
`cache/*.sqlite`	gitignored	Per-repo SQLite cache. Pass `--db-path cache/<repo>.sqlite` to keep them tidy. The default if `--db-path` is omitted is `$XDG_STATE_HOME/gitsweeper/gitsweeper.sqlite`.
`docs/examples/*.md`	gitignored	Generated reports. Convention is `<owner>-<repo>-[<author>-]<date>.md`.
`openspec/specs/`	tracked	Capability specs. Authoritative for behaviour.
`openspec/changes/archive/`	tracked	Historical change proposals (audit trail).

Both cache/ and docs/examples/ are deliberately ignored — they are run-specific artefacts that drift the moment a new snapshot is taken. Keep them locally for sharing; do not commit them.

Commands

Command	What it does	API cost
`fetch <repo>`	Download all pull requests (paginated) into the cache.	~`ceil(n / 100)` calls.
`throughput <repo> [--since] [--author] [--json]`	Time-to-merge percentiles over the cache.	0 (cache only).
`first-response <repo> [--since] [--author] [--json]`	First-non-author-comment percentiles; lazily fetches comments for any uncached PR.	~1 per uncached PR.
`classify <repo> [--author] [--json]`	Self-pulled vs maintainer-closed for closed-without-merge PRs; uses the issue-events endpoint to fill the data the pulls endpoint omits.	~1 per uncached closed-unmerged PR.
`patterns <repo> [--since] [--author] [--json]`	Day-of-week and hour-of-day distributions for submissions and responses.	0 (cache only).
`dora <repo> [--since] [--period week\|month] [--json]`	The four DORA metrics (team-level, proxy-based) over the cache. No `--author`.	0 (cache only).
`retro <repo> [--since] [--stale-days N] [--json]`	Team-level retro signals (stale open PRs, long threads, friction language, tech-debt markers, smooth merges) over the cache + a comments cache. No `--author`.	~1 per uncached PR (comment fetch), then 0.
`report <repo> [--author] [--since] [--refresh] [--out PATH]`	Compose every section above into a single markdown document. `--refresh` runs fetch + first-response + classify before composing.	Sum of the above when `--refresh`; 0 otherwise.
`deliver <repo> [--forge] [--since] [--period week\|month] [--format slack\|markdown] [--out PATH] [--post]`	Compose DORA + retro into one team-level message (Slack Block Kit by default, or markdown). Writes to stdout/`--out`; with `--post` it POSTs the Block Kit to `SLACK_WEBHOOK_URL`. No `--author`.	~1 per uncached PR (comment fetch), then 0. No network unless `--post`.

Manager-MCP

A small MCP server exposes Gitsweeper's PR analyses and a Billbird manager-view (hours, plan-vs-actual) over stdio. Point Claude Desktop (or any MCP-aware AI client) at gitsweeper mcp and a manager can ask plan-vs-actual or PR-throughput questions in natural language. See docs/mcp.md for the configuration snippet, the nine tools the server advertises, the read-only contract, and the required environment.

# Run the server directly (typically Claude Desktop spawns this for you).
uv run gitsweeper mcp

Capabilities

pr-throughput-analysis — fetch + persist, time-to-merge percentiles, --since and --author filtering, opt-in time-to-first-response, day/hour patterns.
pr-classification — close-event-actor enrichment, self-pulled vs maintainer-closed classification.
pr-process-report — orchestrates the others into a single shareable markdown report.
report-rendering — pluggable renderer interface; CLI table, JSON, and markdown renderers.
dora-metrics — the four DORA metrics computed team-level from the PR cache, with documented proxies and DORA performance bands.
retro-signals — team-level retro cues over the PR cache and a new comment-body cache, with documented deterministic keyword sets.

DORA — team-level, proxy-based

gitsweeper dora <repo> computes the four DORA metrics from the cached pull requests only (no forge calls, no --author — DORA measures a team's delivery system, not individuals). Because the cache holds no releases, per-PR commits, or issue history, v1 uses documented proxies:

deployment frequency — merged PRs per --period bucket (week/month); a merge is treated as a deploy.
lead time for changes — median/p75/p90 of created_at → merged_at (PR cycle time as lead time).
change failure rate — corrective merged PRs ÷ all merged PRs, where "corrective" is a deterministic title heuristic (leading revert/hotfix/rollback, or a conventional-commit fix: / fix(scope): prefix).
time to restore — median created_at → merged_at over the corrective merged PRs.

Each metric is annotated with its Elite/High/Medium/Low DORA band and the sample count it came from.

Retro signals — team-level, deterministic

gitsweeper retro <repo> surfaces deterministic cues for a team retrospective over the cached PRs plus a comment cache: stale open PRs (open longer than --stale-days, default 14), long threads (more than 10 cached comments), friction language, tech-debt markers, and smooth merges (merged within 3 days with fewer than 2 comments). Every signal references a PR by number only — never an author (no --author); comment authors are stored in the cache but never surfaced.

The first run fetches each in-scope PR's comments once (one comment-listing call per uncached PR) into a local pr_comments cache; later runs read the cache. Friction (Dutch + English) and tech-debt keyword sets are documented, case-insensitive constants — no LLM, no scoring model — carried verbatim from Road_to_el_DORA-do/.github/prompts/sprint-retro.md so this command stays continuous with the workflow it supersedes.

Delivery — opt-in egress, you do the scheduling

gitsweeper deliver <repo> blends the DORA and retro halves into one team-level message and renders it as a Slack Block Kit payload (default, --format slack) or markdown (--format markdown). By default it writes to stdout or --out FILE and makes no network call.

Egress is opt-in and explicit. --post POSTs the Block Kit payload to a single Slack incoming webhook read from the SLACK_WEBHOOK_URL environment variable; --post without that variable is a named error and sends nothing. The webhook is read from the environment only — never printed, written to --out, or logged. One webhook = one channel.

Scheduling is your job. deliver runs once and exits; it embeds no scheduler. Run it from your own cron entry or scheduling routine, e.g.:

# Mondays at 09:00, post last month's report to Slack.
0 9 * * 1  SLACK_WEBHOOK_URL=... gitsweeper deliver octocat/hello --post

This supersedes the weekly Slack Block Kit GitHub Action in Road_to_el_DORA-do: a portable command composes with any scheduler instead of coupling to one forge's CI.

Development

uv sync
uv run pytest                          # 60 tests; should be green
uv run ruff check .
openspec validate --all --strict

License

EUPL-1.2 — see pyproject.toml.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude		.claude
docs		docs
openspec		openspec
scripts		scripts
src/gitsweeper		src/gitsweeper
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gitsweeper

Quick start

Forges

Where things live on disk

Commands

Manager-MCP

Capabilities

DORA — team-level, proxy-based

Retro signals — team-level, deterministic

Delivery — opt-in egress, you do the scheduling

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gitsweeper

Quick start

Forges

Where things live on disk

Commands

Manager-MCP

Capabilities

DORA — team-level, proxy-based

Retro signals — team-level, deterministic

Delivery — opt-in egress, you do the scheduling

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages