feat(controller): remove stale org members and repo direct collaborators (#146, #147)#151
Open
onuryilmaz wants to merge 12 commits into
Open
feat(controller): remove stale org members and repo direct collaborators (#146, #147)#151onuryilmaz wants to merge 12 commits into
onuryilmaz wants to merge 12 commits into
Conversation
…, #147) Introduce two opt-in controllers for cleaning up stale GitHub access: **#147 — remove org members not in any GitHub team** - New label `repo-guard.cloudoperators.dev/removeOrganizationMember: "true"` enables the feature (default: false/off). - `OrganizationMemberChangeCalculator` diffs live org-member list against the union of all GitHub-team members and emits `GithubUserOperation{type: remove}` for members in neither any team nor the org-owner list. - Safety rail: if `teamObservationsCount == 0` (all team-member fetches failed), no remove operations are generated — prevents mass-removal during an API outage. - New `OrganizationMemberOperations []GithubUserOperation` field in `GithubOrganizationStatusOperations` tracks pending/complete/failed/skipped removals. **#146 — remove direct repo collaborators not covered by any team** - New label `repo-guard.cloudoperators.dev/removeRepositoryDirectCollaborator: "true"` enables the feature (default: false/off). - `RepositoryDirectCollaboratorChangeCalculator` diffs per-repo direct collaborators against the union of team members for teams with access to that repo and emits `GithubRepoUserOperation{type: remove}` for collaborators not covered. - New `GithubRepoUserOperation` type and `RepositoryCollaboratorOperations` slice carry the operations through the reconcile → pending → execute cycle. **Shared foundations** - `spec.protectedMembers []string` on `GithubOrganization` — logins exempt from both features (bots, GitHub App installation user, emergency accounts). - Case-insensitive matching throughout (GitHub logins are case-insensitive). - Per-reconcile deduplication: existing pending or failed ops suppress new duplicate ops. - Rate-limit handling with requeue-after in both calculator blocks and execution loops. - TTL support: `applyRepoUserOpsTTL` integrates `RepositoryCollaboratorOperations` into the existing `failedTTL`/`completedTTL` label-driven cleanup path. - `PendingOperationsFound`, `FailedOperationsFound`, `CleanCompletedOperations`, `CleanFailedOperations` all updated to include both new operation slices. - Metrics: new `orgmembers` and `repocollaborators` scopes in `SetGithubOrganizationMetrics`. - Dashboards: two new stat panels in both Perses and Plutono dashboards. - Helm: new labels and `protectedMembers` spec section documented in values.yaml; labels default to `"false"` for safe opt-in behaviour. - CRDs regenerated via `make manifests generate`. - Fixed pre-existing test race in `githuborganization_ttl_test.go`: added `Eventually` barrier before seeding ops so the initial reconcile settles first; fixed Gomega type mismatch (`Equal` → `BeEquivalentTo` for untyped string constant). - Unit tests: `TestOrganizationMemberChangeCalculator` (10 cases), `TestRepositoryDirectCollaboratorChangeCalculator` (11 cases), `TestApplyRepoUserOpsTTL` (4 cases). Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Adds two opt-in cleanup capabilities to the GithubOrganization controller to remove (1) organization members not present in any GitHub team and (2) direct repository collaborators not covered by any team with access, including status/TTL/metrics/chart/dashboard plumbing and unit/integration test updates.
Changes:
- Introduces
spec.protectedMembersplus new status operation slices for org-member and repo-collaborator removals, with calculators and tests. - Wires new calculators + execution loops into the
GithubOrganizationreconciler, including TTL maintenance for the new op type. - Extends metrics and dashboards to surface pending/complete/failed/skipped counts for the new operation scopes; updates Helm/CRDs accordingly.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
api/v1/githuborganization_types.go |
Adds ProtectedMembers, new operation types/slices, and calculators for org members + direct repo collaborators; updates pending/failed/cleanup helpers. |
api/v1/githuborganization_types_test.go |
Adds unit tests covering both new calculators (incl. dedupe + case-insensitive scenarios). |
api/v1/zz_generated.deepcopy.go |
Regenerates deepcopy logic for the new spec/status fields and types. |
internal/controller/githuborganization_controller.go |
Integrates calculators into reconcile flow, executes new removal ops, and applies TTL cleanup for new op slices. |
internal/controller/ttl.go |
Adds TTL cleanup helper for GithubRepoUserOperation. |
internal/controller/ttl_test.go |
Adds unit tests for applyRepoUserOpsTTL. |
internal/controller/githuborganization_ttl_test.go |
Stabilizes TTL integration tests by waiting for initial controller reconciliation to settle. |
internal/metrics/metrics.go |
Adds orgmembers and repocollaborators operation scopes to organization metrics emission. |
charts/repo-guard/templates/githuborganization.yaml |
Adds opt-in labels and renders spec.protectedMembers in the Helm template. |
charts/repo-guard/values.yaml |
Documents the new opt-in labels and protectedMembers values. |
config/crd/bases/repo-guard.cloudoperators.dev_githuborganizations.yaml |
Updates CRD schema for protectedMembers and new status operation fields. |
charts/repo-guard/crds/githuborganization-crd.yaml |
Mirrors regenerated CRD changes for Helm distribution. |
dashboards/repoguard-status-ops-perses.json |
Adds new stat panels and updates charts to include the new operation scopes. |
dashboards/repoguard-status-ops-plutono.json |
Adds new stat panels and updates charts to include the new operation scopes. |
Files not reviewed (1)
- api/v1/zz_generated.deepcopy.go: Generated file
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ures Three fixes based on code review: 1. Reuse existing teamsList for org-member calculator (PART 4) teamsProvider.List() was being called a second time even though teamsList was already fetched and validated earlier in the same reconcile. Replace the duplicate List() + error branch with a direct iteration over teamsList. 2. Safety rail for repo-collab calculator when all team-member fetches fail If a repo has teams but every getTeamMembers() call returns nil (transient errors), membersForRepo would be empty and all collaborators would look removable. Track observedTeams; if teamsWithAccess > 0 && observedTeams == 0, skip the repo and delete it from repoCollaborators to prevent false-positive removal operations. 3. Treat 404 as idempotent success in OrganizationMemberOperations execution If a user has already left the org between calculation and execution, GitHub returns 404. Mirror the existing repo-collaborator handling: check for "404" in the error string and mark the operation Complete rather than Failed, preventing the org from getting stuck in a failed state for a no-op removal. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
Four fixes from the second Copilot review: 1. README: correct safety rail description for removeOrganizationMember "all team-member lists fetchable" was misleading; the safety rail fires only when teamObservationsCount == 0 (no teams at all observed). Corrects the description to "at least one team-member list must be successfully fetched". 2. Remove redundant l.Error log in PART 4 org-member fetch error path The error was logged twice for non-rate-limit errors (once unconditionally before the rate-limit check, once in the fallthrough). Remove the first (unconditional) log; keep the specific "skipping due to error" message. 3. Guard OrganizationMemberOperations execution on op.Operation == remove The loop called RemoveFromOrg unconditionally on pending ops. Add an explicit op.Operation != GithubUserOperationTypeRemove guard that marks unexpected operation types as Skipped, preventing unintended mutations from malformed status entries. 4. Guard RepositoryCollaboratorOperations execution on op.Operation == remove Same defensive pattern for the repo-collaborator removal loop: check op.Operation != GithubRepoUserOperationTypeRemove and skip/log unexpected types before calling RepositoryCollobaratorRemove. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
… in team member loops - OrganizationMemberChangeCalculator and RepositoryDirectCollaboratorChangeCalculator now clear OrganizationStatusError when generating new pending operations, consistent with the existing RepoChangeCalculator behaviour. - Add rate-limit detection (parseGitHubRateLimitReset) in the PART 4 team-member fetch loop so a GitHub rate limit during org-member safety checks triggers a RequeueAfter instead of silently skipping teams. - Add rate-limit detection in the PART 5 getTeamMembers closure via a captured variable; after the repo loop the reconcile returns RequeueAfter if any team-member fetch was rate-limited. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…fety rails - OrganizationMemberChangeCalculator: update pendingSet after appending each new op to prevent duplicate pending operations when orgMembers contains the same login more than once. - RepositoryDirectCollaboratorChangeCalculator: same fix for the (repo, user) pendingSet. - PART 4 team-member loop: treat any non-rate-limit fetch error as a hard stop (reset teamObservationsCount=0 and break) rather than skipping the team; an incomplete union can cause false-positive org-member removals. - PART 5 repo-collab safety rail: tighten condition from observedTeams==0 to observedTeams!=teamsWithAccess so any partial team-member fetch failure skips the repo, not just total failure. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…ilure tightening - README: update removeOrganizationMember label description to reflect that all team-member fetches must succeed (not "at least one"); any single non-rate-limit error aborts the safety check and suppresses removals. - Controller: update inline comment above the repo-collab safety-rail check to match the observedTeams != teamsWithAccess condition (any failure, not just total failure). - Dashboards: rename "Team & Repo Operations" panel title to "Org Operations (Teams, Repos, Members & Collaborators)" in both Perses and Plutono dashboards to reflect the added orgmembers and repocollaborators scopes. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…al loops The org-member and repo-collaborator execution loops did not re-check spec.protectedMembers before issuing the removal API call. If a login was added to protectedMembers after the operation was queued (e.g. spec updated between calculation and execution), the controller would still remove that user. Add a case-insensitive protectedSet guard in both loops immediately before the API call. Operations matching a protected login are marked Skipped rather than executed. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…ited getTeamMembers Three improvements based on review feedback: - Build orgMemberProtectedSet and repoCollabProtectedSet once before their respective execution loops instead of rebuilding on every pending operation (O(n*m) -> O(n+m)). - Short-circuit getTeamMembers immediately when repoCollabRateLimitResult is already set, avoiding unnecessary GitHub API calls after a rate limit has been detected. Also cache nil for the team that triggered the rate limit. - Update README to document that the removeOrganizationMember safety rail also suppresses removals when the org has no teams at all. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…or backoff retry Previously a non-rate-limit error from organizationsProvider.Members() was only logged; the reconcile continued and silently skipped the org-member cleanup. In a stable cluster with no further events this left the cleanup permanently inactive until something else triggered a reconcile. Return the error so controller-runtime retries with exponential backoff, consistent with how other transient API errors are handled in the loop. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…t status in PART 4/5 Three fixes: - Replace the cross-type cast GithubRepoUserOperationState(userState) in applyBucket with an explicit switch mapping failed/complete to their GithubRepoUserOperationState counterparts. Unknown states now return early so the compiler enforces correctness rather than relying on coincidental string equality. - Persist OrganizationStatus=ratelimited + error message before returning RequeueAfter when teamsProvider.Members hits a rate limit in the PART 4 org-member safety check loop. Status/metrics now reflect the condition and the reset timestamp survives controller restarts. - Same persistence before returning RequeueAfter when repoCollabRateLimitResult is set after the PART 5 repo-collab team-member fetch loop. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
…s for PART 4/5 The fixed-string OrganizationStatusError set in the PART 4 and PART 5 rate-limit paths didn't include the "until ..." timestamp from the original GitHub error, so parseGitHubRateLimitReset couldn't extract the reset time from status on controller restart. Capture the original error string (teamMembersRateLimitErr / repoCollabRateLimitErr) when the rate-limit result is recorded and append it to OrganizationStatusError, matching the pattern used by the other rate-limit branches. Signed-off-by: Onur Yilmaz <onur.yilmaz@sap.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements two new opt-in cleanup features for
GithubOrganization:repo-guard.cloudoperators.dev/removeOrganizationMember: "true"label (default off).repo-guard.cloudoperators.dev/removeRepositoryDirectCollaborator: "true"label (default off).Both features share:
spec.protectedMembers []string— logins exempt from removal (bots, app installation user, emergency accounts)failedTTL/completedTTLlabelsorgmembers,repocollaboratorsscopes) and dashboard panelsSafety
"false"— no existing org is affected without explicit opt-in.teamObservationsCount == 0), zero remove operations are generated for#147— prevents mass-removal during a GitHub API outage.spec.protectedMembersprovides an allowlist for accounts that must never be removed.Changes
api/v1/githuborganization_types.goProtectedMembersspec field,OrganizationMemberOperations+RepositoryCollaboratorOperationsstatus fields,GithubRepoUserOperationtype, two calculator methods, updatedPendingOperationsFound/FailedOperationsFound/Clean*Operationsapi/v1/githuborganization_types_test.goapi/v1/zz_generated.deepcopy.gointernal/controller/githuborganization_controller.gointernal/controller/ttl.goapplyRepoUserOpsTTLfor new op typeinternal/controller/ttl_test.goTestApplyRepoUserOpsTTL(4 cases)internal/controller/githuborganization_ttl_test.gointernal/metrics/metrics.gocharts/repo-guard/templates/githuborganization.yamlprotectedMembersspec sectioncharts/repo-guard/values.yamlconfig/crd/bases/...+charts/repo-guard/crds/...make manifests generatedashboards/repoguard-status-ops-{perses,plutono}.jsonTest plan
make manifests generate— CRDs and deepcopy regenerated cleanlymake fmt vet— no issuesapi/v1unit tests pass (including 21 new calculator cases)internal/metricsunit tests passinternal/controllerintegration tests pass — all 29 specs + newTestApplyRepoUserOpsTTL(4 cases)