Skip to content

MWest2020/Gitsweeper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gitsweeper

Mine and analyse GitHub data — sweep a repo, surface patterns. The v1 focus is pull-request process: how long PRs take to merge, how that latency is distributed, how quickly maintainers first engage, and which closed-without-merge PRs were silently retracted by the submitter versus closed by a maintainer.

Specs live under openspec/. The architecture, stack, and non-trivial design choices are documented in openspec/project.md.

Quick start

# Install and lock dependencies (uv-managed).
uv sync

# Strongly recommended: an authenticated 5000/h budget instead of
# the 60/h unauthenticated one.
export GITHUB_TOKEN=$(gh auth token)   # or any PAT with read:repo

# Fetch the full PR history into the local cache.
uv run gitsweeper fetch nextcloud/app-certificate-requests

# Aggregate analyses against the cache.
uv run gitsweeper throughput nextcloud/app-certificate-requests
uv run gitsweeper first-response nextcloud/app-certificate-requests
uv run gitsweeper classify nextcloud/app-certificate-requests
uv run gitsweeper patterns nextcloud/app-certificate-requests

# One-shot composed markdown report (writes the file to docs/examples/
# by convention; the directory is gitignored so reports never get
# committed by accident).
uv run gitsweeper report nextcloud/app-certificate-requests \
    --refresh \
    --out docs/examples/nextcloud-csr-report-$(date -u +%Y-%m-%d).md

# Per-author slice (case-insensitive on the GitHub login).
uv run gitsweeper report nextcloud/app-certificate-requests \
    --author MWest2020 \
    --out docs/examples/nextcloud-csr-mwest2020-$(date -u +%Y-%m-%d).md

Forges

Gitsweeper acquires data through a forge-provider seam (lib/forge). GitHub (the default), Forgejo/Gitea/Codeberg, and GitLab are supported. A bare owner/repo resolves to GitHub exactly as it always has, so no existing command changes.

Select a forge in one of three ways:

  • explicitly: --forge forgejo, --forge gitlab (or --forge github);
  • by host: a codeberg.org URL is detected as Forgejo, a gitlab.com URL as GitLab, automatically;
  • self-hosted: set GITSWEEPER_FORGEJO_URL / GITSWEEPER_GITLAB_URL to your instance's base URL (e.g. https://git.example.org) and a URL on that host is detected accordingly.
export GITHUB_TOKEN=$(gh auth token)        # GitHub: read:repo PAT
export FORGEJO_TOKEN=...                     # Forgejo/Codeberg/Gitea token
export GITSWEEPER_FORGEJO_URL=https://...    # self-hosted base URL (defaults to Codeberg)
export GITLAB_TOKEN=...                      # GitLab personal access token
export GITSWEEPER_GITLAB_URL=https://...     # self-hosted base URL (defaults to gitlab.com)

gitsweeper fetch --forge forgejo forgejo/forgejo
gitsweeper fetch --forge gitlab gitlab-org/gitlab-runner

Forgejo and GitLab reads can run unauthenticated against public repositories on instances that permit it, with a warn-once notice; set FORGEJO_TOKEN / GITLAB_TOKEN for the full rate-limit budget (GitLab also requires a token for some sub-resources, such as a merge request's notes and state events, even on public projects).

Where things live on disk

Path Status Contents
cache/*.sqlite gitignored Per-repo SQLite cache. Pass --db-path cache/<repo>.sqlite to keep them tidy. The default if --db-path is omitted is $XDG_STATE_HOME/gitsweeper/gitsweeper.sqlite.
docs/examples/*.md gitignored Generated reports. Convention is <owner>-<repo>-[<author>-]<date>.md.
openspec/specs/ tracked Capability specs. Authoritative for behaviour.
openspec/changes/archive/ tracked Historical change proposals (audit trail).

Both cache/ and docs/examples/ are deliberately ignored — they are run-specific artefacts that drift the moment a new snapshot is taken. Keep them locally for sharing; do not commit them.

Commands

Command What it does API cost
fetch <repo> Download all pull requests (paginated) into the cache. ~ceil(n / 100) calls.
throughput <repo> [--since] [--author] [--json] Time-to-merge percentiles over the cache. 0 (cache only).
first-response <repo> [--since] [--author] [--json] First-non-author-comment percentiles; lazily fetches comments for any uncached PR. ~1 per uncached PR.
classify <repo> [--author] [--json] Self-pulled vs maintainer-closed for closed-without-merge PRs; uses the issue-events endpoint to fill the data the pulls endpoint omits. ~1 per uncached closed-unmerged PR.
patterns <repo> [--since] [--author] [--json] Day-of-week and hour-of-day distributions for submissions and responses. 0 (cache only).
dora <repo> [--since] [--period week|month] [--json] The four DORA metrics (team-level, proxy-based) over the cache. No --author. 0 (cache only).
retro <repo> [--since] [--stale-days N] [--json] Team-level retro signals (stale open PRs, long threads, friction language, tech-debt markers, smooth merges) over the cache + a comments cache. No --author. ~1 per uncached PR (comment fetch), then 0.
report <repo> [--author] [--since] [--refresh] [--out PATH] Compose every section above into a single markdown document. --refresh runs fetch + first-response + classify before composing. Sum of the above when --refresh; 0 otherwise.
deliver <repo> [--forge] [--since] [--period week|month] [--format slack|markdown] [--out PATH] [--post] Compose DORA + retro into one team-level message (Slack Block Kit by default, or markdown). Writes to stdout/--out; with --post it POSTs the Block Kit to SLACK_WEBHOOK_URL. No --author. ~1 per uncached PR (comment fetch), then 0. No network unless --post.

Manager-MCP

A small MCP server exposes Gitsweeper's PR analyses and a Billbird manager-view (hours, plan-vs-actual) over stdio. Point Claude Desktop (or any MCP-aware AI client) at gitsweeper mcp and a manager can ask plan-vs-actual or PR-throughput questions in natural language. See docs/mcp.md for the configuration snippet, the nine tools the server advertises, the read-only contract, and the required environment.

# Run the server directly (typically Claude Desktop spawns this for you).
uv run gitsweeper mcp

Capabilities

  • pr-throughput-analysis — fetch + persist, time-to-merge percentiles, --since and --author filtering, opt-in time-to-first-response, day/hour patterns.
  • pr-classification — close-event-actor enrichment, self-pulled vs maintainer-closed classification.
  • pr-process-report — orchestrates the others into a single shareable markdown report.
  • report-rendering — pluggable renderer interface; CLI table, JSON, and markdown renderers.
  • dora-metrics — the four DORA metrics computed team-level from the PR cache, with documented proxies and DORA performance bands.
  • retro-signals — team-level retro cues over the PR cache and a new comment-body cache, with documented deterministic keyword sets.

DORA — team-level, proxy-based

gitsweeper dora <repo> computes the four DORA metrics from the cached pull requests only (no forge calls, no --author — DORA measures a team's delivery system, not individuals). Because the cache holds no releases, per-PR commits, or issue history, v1 uses documented proxies:

  • deployment frequency — merged PRs per --period bucket (week/month); a merge is treated as a deploy.
  • lead time for changes — median/p75/p90 of created_atmerged_at (PR cycle time as lead time).
  • change failure rate — corrective merged PRs ÷ all merged PRs, where "corrective" is a deterministic title heuristic (leading revert/hotfix/rollback, or a conventional-commit fix: / fix(scope): prefix).
  • time to restore — median created_atmerged_at over the corrective merged PRs.

Each metric is annotated with its Elite/High/Medium/Low DORA band and the sample count it came from.

Retro signals — team-level, deterministic

gitsweeper retro <repo> surfaces deterministic cues for a team retrospective over the cached PRs plus a comment cache: stale open PRs (open longer than --stale-days, default 14), long threads (more than 10 cached comments), friction language, tech-debt markers, and smooth merges (merged within 3 days with fewer than 2 comments). Every signal references a PR by number only — never an author (no --author); comment authors are stored in the cache but never surfaced.

The first run fetches each in-scope PR's comments once (one comment-listing call per uncached PR) into a local pr_comments cache; later runs read the cache. Friction (Dutch + English) and tech-debt keyword sets are documented, case-insensitive constants — no LLM, no scoring model — carried verbatim from Road_to_el_DORA-do/.github/prompts/sprint-retro.md so this command stays continuous with the workflow it supersedes.

Delivery — opt-in egress, you do the scheduling

gitsweeper deliver <repo> blends the DORA and retro halves into one team-level message and renders it as a Slack Block Kit payload (default, --format slack) or markdown (--format markdown). By default it writes to stdout or --out FILE and makes no network call.

Egress is opt-in and explicit. --post POSTs the Block Kit payload to a single Slack incoming webhook read from the SLACK_WEBHOOK_URL environment variable; --post without that variable is a named error and sends nothing. The webhook is read from the environment only — never printed, written to --out, or logged. One webhook = one channel.

Scheduling is your job. deliver runs once and exits; it embeds no scheduler. Run it from your own cron entry or scheduling routine, e.g.:

# Mondays at 09:00, post last month's report to Slack.
0 9 * * 1  SLACK_WEBHOOK_URL=... gitsweeper deliver octocat/hello --post

This supersedes the weekly Slack Block Kit GitHub Action in Road_to_el_DORA-do: a portable command composes with any scheduler instead of coupling to one forge's CI.

Development

uv sync
uv run pytest                          # 60 tests; should be green
uv run ruff check .
openspec validate --all --strict

License

EUPL-1.2 — see pyproject.toml.

About

Sweep, Sweep, mine, mine...

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages