Skip to content

feat(controller): remove stale org members and repo direct collaborators (#146, #147)#151

Open
onuryilmaz wants to merge 12 commits into
mainfrom
feat/remove-stale-members-and-collaborators
Open

feat(controller): remove stale org members and repo direct collaborators (#146, #147)#151
onuryilmaz wants to merge 12 commits into
mainfrom
feat/remove-stale-members-and-collaborators

Conversation

@onuryilmaz

Copy link
Copy Markdown
Contributor

Summary

Implements two new opt-in cleanup features for GithubOrganization:

Both features share:

  • spec.protectedMembers []string — logins exempt from removal (bots, app installation user, emergency accounts)
  • Case-insensitive login matching
  • Deduplication (pending/failed op for same user suppresses new duplicate)
  • Rate-limit handling with requeue-after
  • Full TTL integration via existing failedTTL/completedTTL labels
  • Metrics (orgmembers, repocollaborators scopes) and dashboard panels

Safety

  • Opt-in only: both labels default to "false" — no existing org is affected without explicit opt-in.
  • Safety rail: if all team-member fetches fail (teamObservationsCount == 0), zero remove operations are generated for #147 — prevents mass-removal during a GitHub API outage.
  • Protected members: spec.protectedMembers provides an allowlist for accounts that must never be removed.

Changes

File Change
api/v1/githuborganization_types.go New ProtectedMembers spec field, OrganizationMemberOperations + RepositoryCollaboratorOperations status fields, GithubRepoUserOperation type, two calculator methods, updated PendingOperationsFound / FailedOperationsFound / Clean*Operations
api/v1/githuborganization_types_test.go 21 new test cases for both calculators
api/v1/zz_generated.deepcopy.go Regenerated deepcopy for new types
internal/controller/githuborganization_controller.go PART 4 (#147) and PART 5 (#146) calculator blocks + execution loops
internal/controller/ttl.go applyRepoUserOpsTTL for new op type
internal/controller/ttl_test.go TestApplyRepoUserOpsTTL (4 cases)
internal/controller/githuborganization_ttl_test.go Fixed pre-existing race condition + Gomega type mismatch
internal/metrics/metrics.go Two new operation scopes
charts/repo-guard/templates/githuborganization.yaml Two new opt-in labels + protectedMembers spec section
charts/repo-guard/values.yaml Documentation for new fields
config/crd/bases/... + charts/repo-guard/crds/... CRDs regenerated via make manifests generate
dashboards/repoguard-status-ops-{perses,plutono}.json Two new stat panels

Test plan

  • make manifests generate — CRDs and deepcopy regenerated cleanly
  • make fmt vet — no issues
  • api/v1 unit tests pass (including 21 new calculator cases)
  • internal/metrics unit tests pass
  • internal/controller integration tests pass — all 29 specs + new TestApplyRepoUserOpsTTL (4 cases)
  • Pre-existing flaky TTL test fixed (race condition + type mismatch)

…, #147)

Introduce two opt-in controllers for cleaning up stale GitHub access:

**#147 — remove org members not in any GitHub team**
- New label `repo-guard.cloudoperators.dev/removeOrganizationMember: "true"` enables
  the feature (default: false/off).
- `OrganizationMemberChangeCalculator` diffs live org-member list against the union of
  all GitHub-team members and emits `GithubUserOperation{type: remove}` for members in
  neither any team nor the org-owner list.
- Safety rail: if `teamObservationsCount == 0` (all team-member fetches failed), no
  remove operations are generated — prevents mass-removal during an API outage.
- New `OrganizationMemberOperations []GithubUserOperation` field in
  `GithubOrganizationStatusOperations` tracks pending/complete/failed/skipped removals.

**#146 — remove direct repo collaborators not covered by any team**
- New label `repo-guard.cloudoperators.dev/removeRepositoryDirectCollaborator: "true"`
  enables the feature (default: false/off).
- `RepositoryDirectCollaboratorChangeCalculator` diffs per-repo direct collaborators
  against the union of team members for teams with access to that repo and emits
  `GithubRepoUserOperation{type: remove}` for collaborators not covered.
- New `GithubRepoUserOperation` type and `RepositoryCollaboratorOperations` slice carry
  the operations through the reconcile → pending → execute cycle.

**Shared foundations**
- `spec.protectedMembers []string` on `GithubOrganization` — logins exempt from both
  features (bots, GitHub App installation user, emergency accounts).
- Case-insensitive matching throughout (GitHub logins are case-insensitive).
- Per-reconcile deduplication: existing pending or failed ops suppress new duplicate ops.
- Rate-limit handling with requeue-after in both calculator blocks and execution loops.
- TTL support: `applyRepoUserOpsTTL` integrates `RepositoryCollaboratorOperations`
  into the existing `failedTTL`/`completedTTL` label-driven cleanup path.
- `PendingOperationsFound`, `FailedOperationsFound`, `CleanCompletedOperations`,
  `CleanFailedOperations` all updated to include both new operation slices.
- Metrics: new `orgmembers` and `repocollaborators` scopes in
  `SetGithubOrganizationMetrics`.
- Dashboards: two new stat panels in both Perses and Plutono dashboards.
- Helm: new labels and `protectedMembers` spec section documented in values.yaml;
  labels default to `"false"` for safe opt-in behaviour.
- CRDs regenerated via `make manifests generate`.
- Fixed pre-existing test race in `githuborganization_ttl_test.go`: added `Eventually`
  barrier before seeding ops so the initial reconcile settles first; fixed Gomega type
  mismatch (`Equal` → `BeEquivalentTo` for untyped string constant).
- Unit tests: `TestOrganizationMemberChangeCalculator` (10 cases),
  `TestRepositoryDirectCollaboratorChangeCalculator` (11 cases),
  `TestApplyRepoUserOpsTTL` (4 cases).

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
Copilot AI review requested due to automatic review settings June 16, 2026 19:56
…irectCollaborator labels

Add the two new opt-in labels introduced in #146/#147 to the GithubOrganization
labels reference table, and add a note about spec.protectedMembers.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds two opt-in cleanup capabilities to the GithubOrganization controller to remove (1) organization members not present in any GitHub team and (2) direct repository collaborators not covered by any team with access, including status/TTL/metrics/chart/dashboard plumbing and unit/integration test updates.

Changes:

  • Introduces spec.protectedMembers plus new status operation slices for org-member and repo-collaborator removals, with calculators and tests.
  • Wires new calculators + execution loops into the GithubOrganization reconciler, including TTL maintenance for the new op type.
  • Extends metrics and dashboards to surface pending/complete/failed/skipped counts for the new operation scopes; updates Helm/CRDs accordingly.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
api/v1/githuborganization_types.go Adds ProtectedMembers, new operation types/slices, and calculators for org members + direct repo collaborators; updates pending/failed/cleanup helpers.
api/v1/githuborganization_types_test.go Adds unit tests covering both new calculators (incl. dedupe + case-insensitive scenarios).
api/v1/zz_generated.deepcopy.go Regenerates deepcopy logic for the new spec/status fields and types.
internal/controller/githuborganization_controller.go Integrates calculators into reconcile flow, executes new removal ops, and applies TTL cleanup for new op slices.
internal/controller/ttl.go Adds TTL cleanup helper for GithubRepoUserOperation.
internal/controller/ttl_test.go Adds unit tests for applyRepoUserOpsTTL.
internal/controller/githuborganization_ttl_test.go Stabilizes TTL integration tests by waiting for initial controller reconciliation to settle.
internal/metrics/metrics.go Adds orgmembers and repocollaborators operation scopes to organization metrics emission.
charts/repo-guard/templates/githuborganization.yaml Adds opt-in labels and renders spec.protectedMembers in the Helm template.
charts/repo-guard/values.yaml Documents the new opt-in labels and protectedMembers values.
config/crd/bases/repo-guard.cloudoperators.dev_githuborganizations.yaml Updates CRD schema for protectedMembers and new status operation fields.
charts/repo-guard/crds/githuborganization-crd.yaml Mirrors regenerated CRD changes for Helm distribution.
dashboards/repoguard-status-ops-perses.json Adds new stat panels and updates charts to include the new operation scopes.
dashboards/repoguard-status-ops-plutono.json Adds new stat panels and updates charts to include the new operation scopes.
Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go Outdated
Comment thread internal/controller/githuborganization_controller.go Outdated
…ures

Three fixes based on code review:

1. Reuse existing teamsList for org-member calculator (PART 4)
   teamsProvider.List() was being called a second time even though teamsList
   was already fetched and validated earlier in the same reconcile. Replace
   the duplicate List() + error branch with a direct iteration over teamsList.

2. Safety rail for repo-collab calculator when all team-member fetches fail
   If a repo has teams but every getTeamMembers() call returns nil (transient
   errors), membersForRepo would be empty and all collaborators would look
   removable. Track observedTeams; if teamsWithAccess > 0 && observedTeams == 0,
   skip the repo and delete it from repoCollaborators to prevent false-positive
   removal operations.

3. Treat 404 as idempotent success in OrganizationMemberOperations execution
   If a user has already left the org between calculation and execution, GitHub
   returns 404. Mirror the existing repo-collaborator handling: check for "404"
   in the error string and mark the operation Complete rather than Failed,
   preventing the org from getting stuck in a failed state for a no-op removal.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread README.md Outdated
Comment thread internal/controller/githuborganization_controller.go Outdated
Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
Four fixes from the second Copilot review:

1. README: correct safety rail description for removeOrganizationMember
   "all team-member lists fetchable" was misleading; the safety rail fires
   only when teamObservationsCount == 0 (no teams at all observed). Corrects
   the description to "at least one team-member list must be successfully
   fetched".

2. Remove redundant l.Error log in PART 4 org-member fetch error path
   The error was logged twice for non-rate-limit errors (once unconditionally
   before the rate-limit check, once in the fallthrough). Remove the first
   (unconditional) log; keep the specific "skipping due to error" message.

3. Guard OrganizationMemberOperations execution on op.Operation == remove
   The loop called RemoveFromOrg unconditionally on pending ops. Add an
   explicit op.Operation != GithubUserOperationTypeRemove guard that marks
   unexpected operation types as Skipped, preventing unintended mutations
   from malformed status entries.

4. Guard RepositoryCollaboratorOperations execution on op.Operation == remove
   Same defensive pattern for the repo-collaborator removal loop: check
   op.Operation != GithubRepoUserOperationTypeRemove and skip/log unexpected
   types before calling RepositoryCollobaratorRemove.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread api/v1/githuborganization_types.go
Comment thread api/v1/githuborganization_types.go
Comment thread internal/controller/githuborganization_controller.go Outdated
Comment thread internal/controller/githuborganization_controller.go
… in team member loops

- OrganizationMemberChangeCalculator and RepositoryDirectCollaboratorChangeCalculator
  now clear OrganizationStatusError when generating new pending operations, consistent
  with the existing RepoChangeCalculator behaviour.
- Add rate-limit detection (parseGitHubRateLimitReset) in the PART 4 team-member fetch
  loop so a GitHub rate limit during org-member safety checks triggers a RequeueAfter
  instead of silently skipping teams.
- Add rate-limit detection in the PART 5 getTeamMembers closure via a captured variable;
  after the repo loop the reconcile returns RequeueAfter if any team-member fetch was
  rate-limited.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread api/v1/githuborganization_types.go
Comment thread api/v1/githuborganization_types.go
Comment thread internal/controller/githuborganization_controller.go Outdated
Comment thread internal/controller/githuborganization_controller.go Outdated
…fety rails

- OrganizationMemberChangeCalculator: update pendingSet after appending each new op
  to prevent duplicate pending operations when orgMembers contains the same login
  more than once.
- RepositoryDirectCollaboratorChangeCalculator: same fix for the (repo, user) pendingSet.
- PART 4 team-member loop: treat any non-rate-limit fetch error as a hard stop
  (reset teamObservationsCount=0 and break) rather than skipping the team; an
  incomplete union can cause false-positive org-member removals.
- PART 5 repo-collab safety rail: tighten condition from observedTeams==0 to
  observedTeams!=teamsWithAccess so any partial team-member fetch failure skips
  the repo, not just total failure.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread README.md Outdated
Comment thread internal/controller/githuborganization_controller.go
Comment thread dashboards/repoguard-status-ops-perses.json Outdated
Comment thread dashboards/repoguard-status-ops-plutono.json Outdated
…ilure tightening

- README: update removeOrganizationMember label description to reflect that all
  team-member fetches must succeed (not "at least one"); any single non-rate-limit
  error aborts the safety check and suppresses removals.
- Controller: update inline comment above the repo-collab safety-rail check to
  match the observedTeams != teamsWithAccess condition (any failure, not just
  total failure).
- Dashboards: rename "Team & Repo Operations" panel title to "Org Operations
  (Teams, Repos, Members & Collaborators)" in both Perses and Plutono dashboards
  to reflect the added orgmembers and repocollaborators scopes.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 2 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
…al loops

The org-member and repo-collaborator execution loops did not re-check
spec.protectedMembers before issuing the removal API call. If a login
was added to protectedMembers after the operation was queued (e.g. spec
updated between calculation and execution), the controller would still
remove that user.

Add a case-insensitive protectedSet guard in both loops immediately
before the API call. Operations matching a protected login are marked
Skipped rather than executed.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread README.md Outdated
Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
…ited getTeamMembers

Three improvements based on review feedback:

- Build orgMemberProtectedSet and repoCollabProtectedSet once before
  their respective execution loops instead of rebuilding on every
  pending operation (O(n*m) -> O(n+m)).

- Short-circuit getTeamMembers immediately when repoCollabRateLimitResult
  is already set, avoiding unnecessary GitHub API calls after a rate
  limit has been detected. Also cache nil for the team that triggered
  the rate limit.

- Update README to document that the removeOrganizationMember safety
  rail also suppresses removals when the org has no teams at all.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 1 comment.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread internal/controller/githuborganization_controller.go Outdated
…or backoff retry

Previously a non-rate-limit error from organizationsProvider.Members()
was only logged; the reconcile continued and silently skipped the
org-member cleanup. In a stable cluster with no further events this left
the cleanup permanently inactive until something else triggered a
reconcile.

Return the error so controller-runtime retries with exponential backoff,
consistent with how other transient API errors are handled in the loop.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread internal/controller/githuborganization_controller.go Outdated
Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
…t status in PART 4/5

Three fixes:

- Replace the cross-type cast GithubRepoUserOperationState(userState) in
  applyBucket with an explicit switch mapping failed/complete to their
  GithubRepoUserOperationState counterparts. Unknown states now return
  early so the compiler enforces correctness rather than relying on
  coincidental string equality.

- Persist OrganizationStatus=ratelimited + error message before returning
  RequeueAfter when teamsProvider.Members hits a rate limit in the PART 4
  org-member safety check loop. Status/metrics now reflect the condition
  and the reset timestamp survives controller restarts.

- Same persistence before returning RequeueAfter when repoCollabRateLimitResult
  is set after the PART 5 repo-collab team-member fetch loop.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.

Files not reviewed (1)
  • api/v1/zz_generated.deepcopy.go: Generated file

Comment thread internal/controller/githuborganization_controller.go
Comment thread internal/controller/githuborganization_controller.go
Comment thread api/v1/githuborganization_types.go
…s for PART 4/5

The fixed-string OrganizationStatusError set in the PART 4 and PART 5
rate-limit paths didn't include the "until ..." timestamp from the
original GitHub error, so parseGitHubRateLimitReset couldn't extract
the reset time from status on controller restart.

Capture the original error string (teamMembersRateLimitErr /
repoCollabRateLimitErr) when the rate-limit result is recorded and
append it to OrganizationStatusError, matching the pattern used by
the other rate-limit branches.

Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants