Mine and analyse GitHub data — sweep a repo, surface patterns. The v1 focus is pull-request process: how long PRs take to merge, how that latency is distributed, how quickly maintainers first engage, and which closed-without-merge PRs were silently retracted by the submitter versus closed by a maintainer.
Specs live under openspec/. The architecture, stack,
and non-trivial design choices are documented in
openspec/project.md.
# Install and lock dependencies (uv-managed).
uv sync
# Strongly recommended: an authenticated 5000/h budget instead of
# the 60/h unauthenticated one.
export GITHUB_TOKEN=$(gh auth token) # or any PAT with read:repo
# Fetch the full PR history into the local cache.
uv run gitsweeper fetch nextcloud/app-certificate-requests
# Aggregate analyses against the cache.
uv run gitsweeper throughput nextcloud/app-certificate-requests
uv run gitsweeper first-response nextcloud/app-certificate-requests
uv run gitsweeper classify nextcloud/app-certificate-requests
uv run gitsweeper patterns nextcloud/app-certificate-requests
# One-shot composed markdown report (writes the file to docs/examples/
# by convention; the directory is gitignored so reports never get
# committed by accident).
uv run gitsweeper report nextcloud/app-certificate-requests \
--refresh \
--out docs/examples/nextcloud-csr-report-$(date -u +%Y-%m-%d).md
# Per-author slice (case-insensitive on the GitHub login).
uv run gitsweeper report nextcloud/app-certificate-requests \
--author MWest2020 \
--out docs/examples/nextcloud-csr-mwest2020-$(date -u +%Y-%m-%d).mdGitsweeper acquires data through a forge-provider seam (lib/forge).
GitHub (the default), Forgejo/Gitea/Codeberg, and GitLab are
supported. A bare owner/repo resolves to GitHub exactly as it always has,
so no existing command changes.
Select a forge in one of three ways:
- explicitly:
--forge forgejo,--forge gitlab(or--forge github); - by host: a
codeberg.orgURL is detected as Forgejo, agitlab.comURL as GitLab, automatically; - self-hosted: set
GITSWEEPER_FORGEJO_URL/GITSWEEPER_GITLAB_URLto your instance's base URL (e.g.https://git.example.org) and a URL on that host is detected accordingly.
export GITHUB_TOKEN=$(gh auth token) # GitHub: read:repo PAT
export FORGEJO_TOKEN=... # Forgejo/Codeberg/Gitea token
export GITSWEEPER_FORGEJO_URL=https://... # self-hosted base URL (defaults to Codeberg)
export GITLAB_TOKEN=... # GitLab personal access token
export GITSWEEPER_GITLAB_URL=https://... # self-hosted base URL (defaults to gitlab.com)
gitsweeper fetch --forge forgejo forgejo/forgejo
gitsweeper fetch --forge gitlab gitlab-org/gitlab-runnerForgejo and GitLab reads can run unauthenticated against public repositories
on instances that permit it, with a warn-once notice; set FORGEJO_TOKEN /
GITLAB_TOKEN for the full rate-limit budget (GitLab also requires a token
for some sub-resources, such as a merge request's notes and state events,
even on public projects).
| Path | Status | Contents |
|---|---|---|
cache/*.sqlite |
gitignored | Per-repo SQLite cache. Pass --db-path cache/<repo>.sqlite to keep them tidy. The default if --db-path is omitted is $XDG_STATE_HOME/gitsweeper/gitsweeper.sqlite. |
docs/examples/*.md |
gitignored | Generated reports. Convention is <owner>-<repo>-[<author>-]<date>.md. |
openspec/specs/ |
tracked | Capability specs. Authoritative for behaviour. |
openspec/changes/archive/ |
tracked | Historical change proposals (audit trail). |
Both cache/ and docs/examples/ are deliberately ignored — they
are run-specific artefacts that drift the moment a new snapshot is
taken. Keep them locally for sharing; do not commit them.
| Command | What it does | API cost |
|---|---|---|
fetch <repo> |
Download all pull requests (paginated) into the cache. | ~ceil(n / 100) calls. |
throughput <repo> [--since] [--author] [--json] |
Time-to-merge percentiles over the cache. | 0 (cache only). |
first-response <repo> [--since] [--author] [--json] |
First-non-author-comment percentiles; lazily fetches comments for any uncached PR. | ~1 per uncached PR. |
classify <repo> [--author] [--json] |
Self-pulled vs maintainer-closed for closed-without-merge PRs; uses the issue-events endpoint to fill the data the pulls endpoint omits. | ~1 per uncached closed-unmerged PR. |
patterns <repo> [--since] [--author] [--json] |
Day-of-week and hour-of-day distributions for submissions and responses. | 0 (cache only). |
dora <repo> [--since] [--period week|month] [--json] |
The four DORA metrics (team-level, proxy-based) over the cache. No --author. |
0 (cache only). |
retro <repo> [--since] [--stale-days N] [--json] |
Team-level retro signals (stale open PRs, long threads, friction language, tech-debt markers, smooth merges) over the cache + a comments cache. No --author. |
~1 per uncached PR (comment fetch), then 0. |
report <repo> [--author] [--since] [--refresh] [--out PATH] |
Compose every section above into a single markdown document. --refresh runs fetch + first-response + classify before composing. |
Sum of the above when --refresh; 0 otherwise. |
deliver <repo> [--forge] [--since] [--period week|month] [--format slack|markdown] [--out PATH] [--post] |
Compose DORA + retro into one team-level message (Slack Block Kit by default, or markdown). Writes to stdout/--out; with --post it POSTs the Block Kit to SLACK_WEBHOOK_URL. No --author. |
~1 per uncached PR (comment fetch), then 0. No network unless --post. |
A small MCP server exposes Gitsweeper's PR analyses and a Billbird
manager-view (hours, plan-vs-actual) over stdio. Point Claude Desktop
(or any MCP-aware AI client) at gitsweeper mcp and a manager can
ask plan-vs-actual or PR-throughput questions in natural language.
See docs/mcp.md for the configuration snippet,
the nine tools the server advertises, the read-only contract, and the
required environment.
# Run the server directly (typically Claude Desktop spawns this for you).
uv run gitsweeper mcppr-throughput-analysis— fetch + persist, time-to-merge percentiles,--sinceand--authorfiltering, opt-in time-to-first-response, day/hour patterns.pr-classification— close-event-actor enrichment, self-pulled vs maintainer-closed classification.pr-process-report— orchestrates the others into a single shareable markdown report.report-rendering— pluggable renderer interface; CLI table, JSON, and markdown renderers.dora-metrics— the four DORA metrics computed team-level from the PR cache, with documented proxies and DORA performance bands.retro-signals— team-level retro cues over the PR cache and a new comment-body cache, with documented deterministic keyword sets.
gitsweeper dora <repo> computes the four DORA metrics from the cached
pull requests only (no forge calls, no --author — DORA measures a
team's delivery system, not individuals). Because the cache holds no
releases, per-PR commits, or issue history, v1 uses documented proxies:
- deployment frequency — merged PRs per
--periodbucket (week/month); a merge is treated as a deploy. - lead time for changes — median/p75/p90 of
created_at→merged_at(PR cycle time as lead time). - change failure rate — corrective merged PRs ÷ all merged PRs,
where "corrective" is a deterministic title heuristic (leading
revert/hotfix/rollback, or a conventional-commitfix:/fix(scope):prefix). - time to restore — median
created_at→merged_atover the corrective merged PRs.
Each metric is annotated with its Elite/High/Medium/Low DORA band and the sample count it came from.
gitsweeper retro <repo> surfaces deterministic cues for a team
retrospective over the cached PRs plus a comment cache: stale open
PRs (open longer than --stale-days, default 14), long threads
(more than 10 cached comments), friction language, tech-debt
markers, and smooth merges (merged within 3 days with fewer than
2 comments). Every signal references a PR by number only — never an
author (no --author); comment authors are stored in the cache but
never surfaced.
The first run fetches each in-scope PR's comments once (one
comment-listing call per uncached PR) into a local pr_comments cache;
later runs read the cache. Friction (Dutch + English) and tech-debt
keyword sets are documented, case-insensitive constants — no LLM, no
scoring model — carried verbatim from
Road_to_el_DORA-do/.github/prompts/sprint-retro.md
so this command stays continuous with the workflow it supersedes.
gitsweeper deliver <repo> blends the DORA and retro halves into one
team-level message and renders it as a Slack Block Kit payload (default,
--format slack) or markdown (--format markdown). By default it writes
to stdout or --out FILE and makes no network call.
Egress is opt-in and explicit. --post POSTs the Block Kit payload to a
single Slack incoming webhook read from the SLACK_WEBHOOK_URL
environment variable; --post without that variable is a named error and
sends nothing. The webhook is read from the environment only — never
printed, written to --out, or logged. One webhook = one channel.
Scheduling is your job. deliver runs once and exits; it embeds no
scheduler. Run it from your own cron entry or scheduling routine, e.g.:
# Mondays at 09:00, post last month's report to Slack.
0 9 * * 1 SLACK_WEBHOOK_URL=... gitsweeper deliver octocat/hello --postThis supersedes the weekly Slack Block Kit GitHub Action in
Road_to_el_DORA-do:
a portable command composes with any scheduler instead of coupling to one
forge's CI.
uv sync
uv run pytest # 60 tests; should be green
uv run ruff check .
openspec validate --all --strictEUPL-1.2 — see pyproject.toml.