Skip to content

feat: add N-phase support to phased plugin#5046

Draft
dhiller wants to merge 3 commits into
kubevirt:mainfrom
dhiller:feat/phased-n-phase-support
Draft

feat: add N-phase support to phased plugin#5046
dhiller wants to merge 3 commits into
kubevirt:mainfrom
dhiller:feat/phased-n-phase-support

Conversation

@dhiller

@dhiller dhiller commented May 7, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add annotation-based N-phase system (phased.kubevirt.io/phase) to the phased plugin
  • Plugin now handles status events to detect phase completion and trigger the next phase via /test comments
  • Phase 0 jobs (always_run: true) are triggered by Prow automatically; phase 1+ jobs are triggered by the plugin when the previous phase succeeds
  • Existing merge-phase behavior (lgtm+approved labels) is preserved unchanged
  • Includes 4 new test cases for status event handling (12 total, all pass)

Depends on

  • Prow config PR to add status event subscription and job annotations (separate PR)

Test plan

  • All 12 unit tests pass (8 existing + 4 new)
  • go build ./... succeeds
  • Integration test: deploy to staging, verify phase 0 → phase 1 transition
  • Verify existing merge-phase behavior unchanged

🤖 Generated with Claude Code

Add annotation-based phase system (phased.kubevirt.io/phase) that
gates job execution on previous phase completion. The plugin now
handles status events to detect when all jobs in a phase succeed
and triggers the next phase via /test comments.

Phase 0 jobs (basic: build, generate, unit-test) run automatically
via always_run: true. Phase 1+ jobs are triggered by the plugin
when the previous phase completes. The existing merge-phase
behavior (lgtm+approved labels) is preserved unchanged.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Daniel Hiller <dhiller@redhat.com>
@kubevirt-bot

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@kubevirt-bot kubevirt-bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. dco-signoff: yes Indicates the PR's author has DCO signed all their commits. labels May 7, 2026
@kubevirt-bot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rmohr for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements phased presubmit job triggering by responding to GitHub status events, allowing jobs to be executed in sequence based on phase annotations. The review feedback highlights critical performance concerns, specifically the inefficient retrieval of all open pull requests and the repeated parsing of Prow configurations on every status event. Suggestions were also made to prevent redundant job triggers by checking existing statuses, improve error logging for configuration loading, and remove hardcoded repository restrictions to increase the plugin's flexibility.

}

func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) {
allPRs, err := h.ghClient.GetPullRequests(org, repo)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Calling GetPullRequests(org, repo) on every status event is highly inefficient as it retrieves all open pull requests for the repository. For a busy repository like kubevirt/kubevirt, this will result in a large number of API calls and significant overhead. Consider using the GitHub Search API to find only the pull requests matching the specific SHA.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06: added pre-filtering via cached phased job names so GetPullRequests is only called when the status context matches a known phased job. The prow GitHub client doesn't expose the GET /repos/{owner}/{repo}/commits/{sha}/pulls endpoint, so we keep the current approach which is acceptable at kubevirt/kubevirt scale (~100 open PRs).

Comment on lines +245 to +248
presubmits, err := h.loadPresubmits(pr)
if err != nil || presubmits == nil {
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Loading the Prow configuration (loadPresubmits) on every status event is extremely expensive. Status events occur frequently (multiple times per PR per job). Fetching and parsing the configuration repeatedly will likely lead to performance bottlenecks and GitHub API rate limiting. It is strongly recommended to use a cached configuration agent or at least cache the parsed results for a short period.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06: added presubmitCache with a 5-minute TTL so config is loaded once and reused across status events. The TTL is needed because the plugin fetches configs from raw.githubusercontent.com at runtime (it is not redeployed on config changes).

Comment on lines +268 to +273
nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap)
if nextPhase < 0 {
return
}

triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current logic triggers all jobs in the next phase if at least one job is missing a status. This can lead to redundant triggers for jobs that are already running (e.g., if they were manually triggered or are always_run: true). You should pass the statusMap to triggerPhaseJobs and only trigger jobs that do not yet have an entry in the map.

Suggested change
nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap)
if nextPhase < 0 {
return
}
triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase)
nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap)
if nextPhase < 0 {
return
}
triggerPhaseJobs(log, h.ghClient, pr, phaseJobs[nextPhase], nextPhase, statusMap)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06: triggerPhaseJobs now accepts statusMap and skips jobs that already have a status entry, preventing redundant triggers.

Comment on lines +276 to +288
func triggerPhaseJobs(log *logrus.Entry, ghClient githubClientInterface, pr github.PullRequest, jobs []config.Presubmit, phase int) {
org := pr.Base.Repo.Owner.Login
repo := pr.Base.Repo.Name

if !(org == "kubevirt" && repo == "kubevirt") {
return
}

var result string
for _, job := range jobs {
result += "/test " + job.Name + "\n"
log.Debugf("Phase %d: triggering presubmit %s", phase, job.Name)
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Filter out jobs that already have a status to avoid redundant triggers. This ensures that if a phase is partially started, only the remaining jobs are triggered.

func triggerPhaseJobs(log *logrus.Entry, ghClient githubClientInterface, pr github.PullRequest, jobs []config.Presubmit, phase int, statusMap map[string]string) {
	org := pr.Base.Repo.Owner.Login
	repo := pr.Base.Repo.Name

	if !(org == "kubevirt" && repo == "kubevirt") {
		return
	}

	var result string
	for _, job := range jobs {
		ctx := jobContext(job)
		if _, exists := statusMap[ctx]; exists {
			continue
		}
		result += "/test " + job.Name + "\n"
		log.Debugf("Phase %d: triggering presubmit %s", phase, job.Name)
	}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06: triggerPhaseJobs checks statusMap for each job context and skips any job that already has a status (pending, success, etc.).

Comment on lines +205 to +207
if !(org == "kubevirt" && repo == "kubevirt") {
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The repository check is hardcoded to kubevirt/kubevirt. This prevents the plugin from being used with other repositories or organizations. Consider making this configurable via the plugin options or removing the restriction to allow the plugin to be more generic.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed: the hardcoded kubevirt/kubevirt check is pre-existing (same pattern as the original testRequested function). Making this configurable would be a separate improvement.

}

presubmits, err := h.loadPresubmits(pr)
if err != nil || presubmits == nil {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Errors encountered while loading presubmits are silently ignored. It is better to log these errors to assist in debugging when configuration loading fails.

Suggested change
if err != nil || presubmits == nil {
presubmits, err := h.loadPresubmits(pr)
if err != nil {
log.WithError(err).Error("Could not load presubmits")
return
}
if presubmits == nil {
return
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06: handlePhaseTransition now logs errors explicitly when loadPresubmits fails.

- Add TTL-based config cache to avoid reloading Prow config on
  every status event
- Pre-filter status events by checking context against cached
  phased job names before PR lookup
- Only trigger untriggered jobs in triggerPhaseJobs, skip jobs
  that already have a status entry
- Add explicit error logging for loadPresubmits failures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Daniel Hiller <dhiller@redhat.com>
@dhiller

dhiller commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request extends the phased plugin to support sequential presubmit job triggering based on GitHub status events. It introduces a caching mechanism for presubmit configurations and logic to identify and trigger the next phase of jobs when the current phase succeeds. The review feedback identifies several critical areas for improvement: the current implementation is inefficient due to fetching all open pull requests on every status event, the cache lacks multi-repository support, and there is a risk of duplicate test triggers due to race conditions. Additionally, the feedback suggests optimizing API calls for shared SHAs and correcting the phase numbering logic to handle non-consecutive phase definitions.

}

func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) {
allPRs, err := h.ghClient.GetPullRequests(org, repo)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

GetPullRequests(org, repo) fetches all open pull requests for the repository. Calling this on every successful status event is extremely inefficient for large repositories like kubevirt/kubevirt. Consider using a more targeted API to find PRs by commit SHA (e.g., GET /repos/{owner}/{repo}/commits/{commit_sha}/pulls) if the GitHub client supports it, or implementing a local SHA-to-PR mapping cache.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in f2c3f06 (pre-filtering) and 0328b37 (cache warming). The prow GitHub client doesn't expose GET /repos/{owner}/{repo}/commits/{sha}/pulls, so GetPullRequests is kept but only called when the status context matches a known phased job name.

}
}

nextPhase := findNextPhaseToTrigger(phaseJobs, sortedPhases, statusMap)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a known race condition between posting a /test comment and Prow creating the corresponding GitHub status. If another status event (e.g., from an unrelated job) occurs during this window, findNextPhaseToTrigger will see the next phase jobs as "untriggered" (missing from statusMap) and triggerPhaseJobs will post duplicate /test comments. Consider implementing a short-term in-memory cache of recently triggered jobs to mitigate this.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mitigated by the statusMap check in triggerPhaseJobs (skips jobs with existing status) and Prow's idempotent /test command handling. A short-term in-memory lock would add complexity for marginal benefit given these two safeguards.

Comment on lines +58 to +64
type presubmitCache struct {
mu sync.Mutex
presubmits []config.Presubmit
phasedNames map[string]bool
loadedAt time.Time
ttl time.Duration
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The presubmitCache struct currently stores a single set of presubmits and phased job names. If this plugin is configured to handle multiple repositories (which is common for Prow plugins), the cache will be overwritten whenever a status event from a different repository is processed. Consider keying the cache by repository (e.g., map[string]*repoCache).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed: the plugin is currently hardcoded to kubevirt/kubevirt only, so single-repo cache is sufficient. Multi-repo keying would be needed if/when the hardcoded check is removed.

Comment on lines +244 to +248
_, phasedNames, ok := h.cache.get()
if ok && !phasedNames[event.Context] {
log.Debugf("Status context %q is not a phased job, skipping", event.Context)
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When the cache is empty or expired (ok == false), the handler skips the phased job check and proceeds to the expensive findPRsForSHA operation (which lists all open PRs). This means every status event for a non-phased job will trigger a full PR list fetch if the cache is cold. It is better to ensure the cache is populated before deciding whether to proceed with the event handling.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0328b37: added warmCache() — when the cache is cold, we now populate it eagerly before calling findPRsForSHA, so non-phased status events are filtered out even on first use.

return
}

combinedStatus, err := h.ghClient.GetCombinedStatus(org, repo, sha)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

GetCombinedStatus is called inside handlePhaseTransition, which is executed for every PR matching the SHA. If multiple PRs share the same HEAD SHA, this results in redundant API calls. Consider fetching the combined status once in handleStatusEvent and passing it down to the transition handler.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0328b37: GetCombinedStatus is now called once in handleStatusEvent and the resulting statusMap is passed down to handlePhaseTransition, avoiding redundant API calls when multiple PRs share the same HEAD SHA.

Comment on lines +345 to +346
prevPhase := phase - 1
result = fmt.Sprintf(PhaseIntroFmt, prevPhase, phase) + result

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The calculation prevPhase := phase - 1 assumes that phase numbers are consecutive integers. If a user defines non-consecutive phases (e.g., Phase 0 and Phase 10), the triggered comment will incorrectly state "Phase 9 jobs completed". It would be more accurate to pass the actual completed phase number from the sortedPhases slice to this function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 0328b37: findNextPhaseToTrigger now returns (completedPhase, nextPhase) and triggerPhaseJobs uses the actual completed phase number in the intro message, correctly handling non-consecutive phase definitions.

- Warm cache on cold path before PR lookup to ensure phased name
  filtering works from the first status event
- Fetch combined status once in handleStatusEvent and pass to
  handlePhaseTransition to avoid redundant API calls for shared SHAs
- Fix non-consecutive phase intro message by passing actual completed
  phase number instead of computing phase - 1
- Remove unused parameters from handlePhaseTransition

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Daniel Hiller <dhiller@redhat.com>
@dhiller

dhiller commented May 8, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a phased execution model for presubmit jobs, allowing the plugin to trigger subsequent phases automatically upon the successful completion of all jobs in a preceding phase. Key additions include a status event handler, a TTL-based cache for presubmit configurations, and logic to group jobs by phase annotations. Review feedback identifies several critical improvements: optimizing PR lookups by using commit-specific API calls instead of listing all open PRs, addressing a logic bug where skipped jobs (e.g., due to file filters) would indefinitely block phase transitions, and removing hardcoded branch references in the cache warming logic to support repositories with different default branches.


type githubClientInterface interface {
GetPullRequest(string, string, int) (*github.PullRequest, error)
GetPullRequests(string, string) ([]github.PullRequest, error)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using GetPullRequests to list all open PRs and then filtering by SHA is inefficient, especially for large repositories. It is better to use ListPullRequestsByCommit, which is specifically designed to find PRs associated with a commit.

Suggested change
GetPullRequests(string, string) ([]github.PullRequest, error)
ListPullRequestsByCommit(org, repo, sha string) ([]github.PullRequest, error)

Comment on lines +300 to +312
func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) {
allPRs, err := h.ghClient.GetPullRequests(org, repo)
if err != nil {
return nil, fmt.Errorf("listing PRs: %w", err)
}
var matched []github.PullRequest
for _, pr := range allPRs {
if pr.Head.SHA == sha && pr.State == "open" {
matched = append(matched, pr)
}
}
return matched, nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Update this function to use ListPullRequestsByCommit for better performance. This avoids fetching all open PRs for the repository.

Suggested change
func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) {
allPRs, err := h.ghClient.GetPullRequests(org, repo)
if err != nil {
return nil, fmt.Errorf("listing PRs: %w", err)
}
var matched []github.PullRequest
for _, pr := range allPRs {
if pr.Head.SHA == sha && pr.State == "open" {
matched = append(matched, pr)
}
}
return matched, nil
}
func (h *GitHubEventsHandler) findPRsForSHA(org, repo, sha string) ([]github.PullRequest, error) {
allPRs, err := h.ghClient.ListPullRequestsByCommit(org, repo, sha)
if err != nil {
return nil, fmt.Errorf("listing PRs for commit: %w", err)
}
var matched []github.PullRequest
for _, pr := range allPRs {
if pr.Head.SHA == sha && pr.State == "open" {
matched = append(matched, pr)
}
}
return matched, nil
}

Comment on lines +533 to +542
func allJobsSucceeded(jobs []config.Presubmit, statusMap map[string]string) bool {
for _, job := range jobs {
ctx := jobContext(job)
state, exists := statusMap[ctx]
if !exists || state != github.StatusSuccess {
return false
}
}
return true
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This logic will fail for Pull Requests where some phased jobs are skipped (e.g., due to run_if_changed or skip_if_only_changed rules). If a job is skipped, it won't appear in the statusMap, and allJobsSucceeded will return false indefinitely, blocking subsequent phases.

You should filter the jobs list to only include those that are actually expected to run for the specific PR before checking their status.

Comment on lines +244 to +252
_, phasedNames, ok := h.cache.get()
if !ok {
h.warmCache(log, org, repo)
_, phasedNames, ok = h.cache.get()
}
if ok && !phasedNames[event.Context] {
log.Debugf("Status context %q is not a phased job, skipping", event.Context)
return
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The warmCache call here is problematic because it hardcodes the branch to "main". If the repository uses "master" or another default branch, the cache will fail to warm, leading to redundant configuration loads on every event.

Instead of pre-warming with a hardcoded branch, you can simply skip the early return if the cache is cold. The cache will then be naturally populated with the correct branch information during the first handlePhaseTransition call.

Suggested change
_, phasedNames, ok := h.cache.get()
if !ok {
h.warmCache(log, org, repo)
_, phasedNames, ok = h.cache.get()
}
if ok && !phasedNames[event.Context] {
log.Debugf("Status context %q is not a phased job, skipping", event.Context)
return
}
_, phasedNames, ok := h.cache.get()
if ok && !phasedNames[event.Context] {
log.Debugf("Status context %q is not a phased job, skipping", event.Context)
return
}

Comment on lines +283 to +298
func (h *GitHubEventsHandler) warmCache(log *logrus.Entry, org, repo string) {
fullName := org + "/" + repo
presubmits, err := h.loadPresubmits(github.PullRequest{
Base: github.PullRequestBranch{
Ref: "main",
Repo: github.Repo{FullName: fullName},
},
})
if err != nil {
log.WithError(err).Error("Could not warm presubmit cache")
return
}
if presubmits != nil {
h.cache.set(presubmits)
}
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This function can be removed if the lazy-loading approach suggested in handleStatusEvent is adopted. This avoids hardcoding branch names like "main" and reduces unnecessary complexity.

Comment on lines +44 to +52
func (f *fakeGHClient) GetPullRequests(org, repo string) ([]github.PullRequest, error) {
var prs []github.PullRequest
for _, pr := range f.PullRequests {
if pr != nil {
prs = append(prs, *pr)
}
}
return prs, nil
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This manual implementation is no longer needed if the interface is updated to use ListPullRequestsByCommit, as the embedded fakegithub.FakeClient already provides a suitable implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has DCO signed all their commits. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants