feat: add N-phase support to phased plugin#5046
Conversation
Add annotation-based phase system (phased.kubevirt.io/phase) that gates job execution on previous phase completion. The plugin now handles status events to detect when all jobs in a phase succeed and triggers the next phase via /test comments. Phase 0 jobs (basic: build, generate, unit-test) run automatically via always_run: true. Phase 1+ jobs are triggered by the plugin when the previous phase completes. The existing merge-phase behavior (lgtm+approved labels) is preserved unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Daniel Hiller <dhiller@redhat.com>
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request implements phased presubmit job triggering by responding to GitHub status events, allowing jobs to be executed in sequence based on phase annotations. The review feedback highlights critical performance concerns, specifically the inefficient retrieval of all open pull requests and the repeated parsing of Prow configurations on every status event. Suggestions were also made to prevent redundant job triggers by checking existing statuses, improve error logging for configuration loading, and remove hardcoded repository restrictions to increase the plugin's flexibility.
| } | ||
|
|
||
| func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) { | ||
| allPRs, err := h.ghClient.GetPullRequests(org, repo) |
There was a problem hiding this comment.
Calling GetPullRequests(org, repo) on every status event is highly inefficient as it retrieves all open pull requests for the repository. For a busy repository like kubevirt/kubevirt, this will result in a large number of API calls and significant overhead. Consider using the GitHub Search API to find only the pull requests matching the specific SHA.
There was a problem hiding this comment.
Addressed in f2c3f06: added pre-filtering via cached phased job names so GetPullRequests is only called when the status context matches a known phased job. The prow GitHub client doesn't expose the GET /repos/{owner}/{repo}/commits/{sha}/pulls endpoint, so we keep the current approach which is acceptable at kubevirt/kubevirt scale (~100 open PRs).
| presubmits, err := h.loadPresubmits(pr) | ||
| if err != nil || presubmits == nil { | ||
| return | ||
| } |
There was a problem hiding this comment.
Loading the Prow configuration (loadPresubmits) on every status event is extremely expensive. Status events occur frequently (multiple times per PR per job). Fetching and parsing the configuration repeatedly will likely lead to performance bottlenecks and GitHub API rate limiting. It is strongly recommended to use a cached configuration agent or at least cache the parsed results for a short period.
There was a problem hiding this comment.
Addressed in f2c3f06: added presubmitCache with a 5-minute TTL so config is loaded once and reused across status events. The TTL is needed because the plugin fetches configs from raw.githubusercontent.com at runtime (it is not redeployed on config changes).
| nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap) | ||
| if nextPhase < 0 { | ||
| return | ||
| } | ||
|
|
||
| triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase) |
There was a problem hiding this comment.
The current logic triggers all jobs in the next phase if at least one job is missing a status. This can lead to redundant triggers for jobs that are already running (e.g., if they were manually triggered or are always_run: true). You should pass the statusMap to triggerPhaseJobs and only trigger jobs that do not yet have an entry in the map.
| nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap) | |
| if nextPhase < 0 { | |
| return | |
| } | |
| triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase) | |
| nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap) | |
| if nextPhase < 0 { | |
| return | |
| } | |
| triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase, statusMap) |
There was a problem hiding this comment.
Addressed in f2c3f06: triggerPhaseJobs now accepts statusMap and skips jobs that already have a status entry, preventing redundant triggers.
| func triggerPhaseJobs(log *logrus.Entry, ghClient githubClientInterface, pr github.PullRequest, jobs []config.Presubmit, phase int) { | ||
| org := pr.Base.Repo.Owner.Login | ||
| repo := pr.Base.Repo.Name | ||
|
|
||
| if !(org == "kubevirt" && repo == "kubevirt") { | ||
| return | ||
| } | ||
|
|
||
| var result string | ||
| for _, job := range jobs { | ||
| result += "/test " + job.Name + "\n" | ||
| log.Debugf("Phase %d: triggering presubmit %s", phase, job.Name) | ||
| } |
There was a problem hiding this comment.
Filter out jobs that already have a status to avoid redundant triggers. This ensures that if a phase is partially started, only the remaining jobs are triggered.
func triggerPhaseJobs(log *logrus.Entry, ghClient githubClientInterface, pr github.PullRequest, jobs []config.Presubmit, phase int, statusMap map[string]string) {
org := pr.Base.Repo.Owner.Login
repo := pr.Base.Repo.Name
if !(org == "kubevirt" && repo == "kubevirt") {
return
}
var result string
for _, job := range jobs {
ctx := jobContext(job)
if _, exists := statusMap[ctx]; exists {
continue
}
result += "/test " + job.Name + "\n"
log.Debugf("Phase %d: triggering presubmit %s", phase, job.Name)
}There was a problem hiding this comment.
Addressed in f2c3f06: triggerPhaseJobs checks statusMap for each job context and skips any job that already has a status (pending, success, etc.).
| if !(org == "kubevirt" && repo == "kubevirt") { | ||
| return | ||
| } |
There was a problem hiding this comment.
There was a problem hiding this comment.
Not addressed: the hardcoded kubevirt/kubevirt check is pre-existing (same pattern as the original testRequested function). Making this configurable would be a separate improvement.
| } | ||
|
|
||
| presubmits, err := h.loadPresubmits(pr) | ||
| if err != nil || presubmits == nil { |
There was a problem hiding this comment.
Errors encountered while loading presubmits are silently ignored. It is better to log these errors to assist in debugging when configuration loading fails.
| if err != nil || presubmits == nil { | |
| presubmits, err := h.loadPresubmits(pr) | |
| if err != nil { | |
| log.WithError(err).Error("Could not load presubmits") | |
| return | |
| } | |
| if presubmits == nil { | |
| return | |
| } |
There was a problem hiding this comment.
Addressed in f2c3f06: handlePhaseTransition now logs errors explicitly when loadPresubmits fails.
- Add TTL-based config cache to avoid reloading Prow config on every status event - Pre-filter status events by checking context against cached phased job names before PR lookup - Only trigger untriggered jobs in triggerPhaseJobs, skip jobs that already have a status entry - Add explicit error logging for loadPresubmits failures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Daniel Hiller <dhiller@redhat.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request extends the phased plugin to support sequential presubmit job triggering based on GitHub status events. It introduces a caching mechanism for presubmit configurations and logic to identify and trigger the next phase of jobs when the current phase succeeds. The review feedback identifies several critical areas for improvement: the current implementation is inefficient due to fetching all open pull requests on every status event, the cache lacks multi-repository support, and there is a risk of duplicate test triggers due to race conditions. Additionally, the feedback suggests optimizing API calls for shared SHAs and correcting the phase numbering logic to handle non-consecutive phase definitions.
| } | ||
|
|
||
| func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) { | ||
| allPRs, err := h.ghClient.GetPullRequests(org, repo) |
There was a problem hiding this comment.
GetPullRequests(org, repo) fetches all open pull requests for the repository. Calling this on every successful status event is extremely inefficient for large repositories like kubevirt/kubevirt. Consider using a more targeted API to find PRs by commit SHA (e.g., GET /repos/{owner}/{repo}/commits/{commit_sha}/pulls) if the GitHub client supports it, or implementing a local SHA-to-PR mapping cache.
| } | ||
| } | ||
|
|
||
| nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap) |
There was a problem hiding this comment.
There is a known race condition between posting a /test comment and Prow creating the corresponding GitHub status. If another status event (e.g., from an unrelated job) occurs during this window, findNextPhaseToTrigger will see the next phase jobs as "untriggered" (missing from statusMap) and triggerPhaseJobs will post duplicate /test comments. Consider implementing a short-term in-memory cache of recently triggered jobs to mitigate this.
There was a problem hiding this comment.
Mitigated by the statusMap check in triggerPhaseJobs (skips jobs with existing status) and Prow's idempotent /test command handling. A short-term in-memory lock would add complexity for marginal benefit given these two safeguards.
| type presubmitCache struct { | ||
| mu sync.Mutex | ||
| presubmits []config.Presubmit | ||
| phasedNames map[string]bool | ||
| loadedAt time.Time | ||
| ttl time.Duration | ||
| } |
There was a problem hiding this comment.
The presubmitCache struct currently stores a single set of presubmits and phased job names. If this plugin is configured to handle multiple repositories (which is common for Prow plugins), the cache will be overwritten whenever a status event from a different repository is processed. Consider keying the cache by repository (e.g., map[string]*repoCache).
There was a problem hiding this comment.
Not addressed: the plugin is currently hardcoded to kubevirt/kubevirt only, so single-repo cache is sufficient. Multi-repo keying would be needed if/when the hardcoded check is removed.
| _, phasedNames, ok := h.cache.get() | ||
| if ok && !phasedNames[event.Context] { | ||
| log.Debugf("Status context %q is not a phased job, skipping", event.Context) | ||
| return | ||
| } |
There was a problem hiding this comment.
When the cache is empty or expired (ok == false), the handler skips the phased job check and proceeds to the expensive findPRsForSHA operation (which lists all open PRs). This means every status event for a non-phased job will trigger a full PR list fetch if the cache is cold. It is better to ensure the cache is populated before deciding whether to proceed with the event handling.
There was a problem hiding this comment.
Addressed in 0328b37: added warmCache() — when the cache is cold, we now populate it eagerly before calling findPRsForSHA, so non-phased status events are filtered out even on first use.
| return | ||
| } | ||
|
|
||
| combinedStatus, err := h.ghClient.GetCombinedStatus(org, repo, sha) |
There was a problem hiding this comment.
There was a problem hiding this comment.
Addressed in 0328b37: GetCombinedStatus is now called once in handleStatusEvent and the resulting statusMap is passed down to handlePhaseTransition, avoiding redundant API calls when multiple PRs share the same HEAD SHA.
| prevPhase := phase - 1 | ||
| result = fmt.Sprintf(PhaseIntroFmt, prevPhase, phase) + result |
There was a problem hiding this comment.
The calculation prevPhase := phase - 1 assumes that phase numbers are consecutive integers. If a user defines non-consecutive phases (e.g., Phase 0 and Phase 10), the triggered comment will incorrectly state "Phase 9 jobs completed". It would be more accurate to pass the actual completed phase number from the sortedPhases slice to this function.
There was a problem hiding this comment.
Addressed in 0328b37: findNextPhaseToTrigger now returns (completedPhase, nextPhase) and triggerPhaseJobs uses the actual completed phase number in the intro message, correctly handling non-consecutive phase definitions.
- Warm cache on cold path before PR lookup to ensure phased name filtering works from the first status event - Fetch combined status once in handleStatusEvent and pass to handlePhaseTransition to avoid redundant API calls for shared SHAs - Fix non-consecutive phase intro message by passing actual completed phase number instead of computing phase - 1 - Remove unused parameters from handlePhaseTransition Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Daniel Hiller <dhiller@redhat.com>
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a phased execution model for presubmit jobs, allowing the plugin to trigger subsequent phases automatically upon the successful completion of all jobs in a preceding phase. Key additions include a status event handler, a TTL-based cache for presubmit configurations, and logic to group jobs by phase annotations. Review feedback identifies several critical improvements: optimizing PR lookups by using commit-specific API calls instead of listing all open PRs, addressing a logic bug where skipped jobs (e.g., due to file filters) would indefinitely block phase transitions, and removing hardcoded branch references in the cache warming logic to support repositories with different default branches.
|
|
||
| type githubClientInterface interface { | ||
| GetPullRequest(string, string, int) (*github.PullRequest, error) | ||
| GetPullRequests(string, string) ([]github.PullRequest, error) |
There was a problem hiding this comment.
Using GetPullRequests to list all open PRs and then filtering by SHA is inefficient, especially for large repositories. It is better to use ListPullRequestsByCommit, which is specifically designed to find PRs associated with a commit.
| GetPullRequests(string, string) ([]github.PullRequest, error) | |
| ListPullRequestsByCommit(org, repo, sha string) ([]github.PullRequest, error) |
| func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) { | ||
| allPRs, err := h.ghClient.GetPullRequests(org, repo) | ||
| if err != nil { | ||
| return nil, fmt.Errorf("listing PRs: %w", err) | ||
| } | ||
| var matched []github.PullRequest | ||
| for _, pr := range allPRs { | ||
| if pr.Head.SHA == sha && pr.State == "open" { | ||
| matched = append(matched, pr) | ||
| } | ||
| } | ||
| return matched, nil | ||
| } |
There was a problem hiding this comment.
Update this function to use ListPullRequestsByCommit for better performance. This avoids fetching all open PRs for the repository.
| func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) { | |
| allPRs, err := h.ghClient.GetPullRequests(org, repo) | |
| if err != nil { | |
| return nil, fmt.Errorf("listing PRs: %w", err) | |
| } | |
| var matched []github.PullRequest | |
| for _, pr := range allPRs { | |
| if pr.Head.SHA == sha && pr.State == "open" { | |
| matched = append(matched, pr) | |
| } | |
| } | |
| return matched, nil | |
| } | |
| func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) { | |
| allPRs, err := h.ghClient.ListPullRequestsByCommit(org, repo, sha) | |
| if err != nil { | |
| return nil, fmt.Errorf("listing PRs for commit: %w", err) | |
| } | |
| var matched []github.PullRequest | |
| for _, pr := range allPRs { | |
| if pr.Head.SHA == sha && pr.State == "open" { | |
| matched = append(matched, pr) | |
| } | |
| } | |
| return matched, nil | |
| } |
| func allJobsSucceeded(jobs []config.Presubmit, statusMap map[string]string) bool { | ||
| for _, job := range jobs { | ||
| ctx := jobContext(job) | ||
| state, exists := statusMap[ctx] | ||
| if !exists || state != github.StatusSuccess { | ||
| return false | ||
| } | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
This logic will fail for Pull Requests where some phased jobs are skipped (e.g., due to run_if_changed or skip_if_only_changed rules). If a job is skipped, it won't appear in the statusMap, and allJobsSucceeded will return false indefinitely, blocking subsequent phases.
You should filter the jobs list to only include those that are actually expected to run for the specific PR before checking their status.
| _, phasedNames, ok := h.cache.get() | ||
| if !ok { | ||
| h.warmCache(log, org, repo) | ||
| _, phasedNames, ok = h.cache.get() | ||
| } | ||
| if ok && !phasedNames[event.Context] { | ||
| log.Debugf("Status context %q is not a phased job, skipping", event.Context) | ||
| return | ||
| } |
There was a problem hiding this comment.
The warmCache call here is problematic because it hardcodes the branch to "main". If the repository uses "master" or another default branch, the cache will fail to warm, leading to redundant configuration loads on every event.
Instead of pre-warming with a hardcoded branch, you can simply skip the early return if the cache is cold. The cache will then be naturally populated with the correct branch information during the first handlePhaseTransition call.
| _, phasedNames, ok := h.cache.get() | |
| if !ok { | |
| h.warmCache(log, org, repo) | |
| _, phasedNames, ok = h.cache.get() | |
| } | |
| if ok && !phasedNames[event.Context] { | |
| log.Debugf("Status context %q is not a phased job, skipping", event.Context) | |
| return | |
| } | |
| _, phasedNames, ok := h.cache.get() | |
| if ok && !phasedNames[event.Context] { | |
| log.Debugf("Status context %q is not a phased job, skipping", event.Context) | |
| return | |
| } |
| func (h *GitHubEventsHandler) warmCache(log *logrus.Entry, org, repo string) { | ||
| fullName := org + "/" + repo | ||
| presubmits, err := h.loadPresubmits(github.PullRequest{ | ||
| Base: github.PullRequestBranch{ | ||
| Ref: "main", | ||
| Repo: github.Repo{FullName: fullName}, | ||
| }, | ||
| }) | ||
| if err != nil { | ||
| log.WithError(err).Error("Could not warm presubmit cache") | ||
| return | ||
| } | ||
| if presubmits != nil { | ||
| h.cache.set(presubmits) | ||
| } | ||
| } |
| func (f *fakeGHClient) GetPullRequests(org, repo string) ([]github.PullRequest, error) { | ||
| var prs []github.PullRequest | ||
| for _, pr := range f.PullRequests { | ||
| if pr != nil { | ||
| prs = append(prs, *pr) | ||
| } | ||
| } | ||
| return prs, nil | ||
| } |
Summary
phased.kubevirt.io/phase) to the phased pluginstatusevents to detect phase completion and trigger the next phase via/testcommentsalways_run: true) are triggered by Prow automatically; phase 1+ jobs are triggered by the plugin when the previous phase succeedsDepends on
statusevent subscription and job annotations (separate PR)Test plan
go build ./...succeeds🤖 Generated with Claude Code