Skip to content

Fix false test failures caused by Ginkgo reporter output on stdout#72

Open
anuragthehatter wants to merge 1 commit into
openshift-eng:mainfrom
anuragthehatter:fix-ginkgo-reporter-stdout
Open

Fix false test failures caused by Ginkgo reporter output on stdout#72
anuragthehatter wants to merge 1 commit into
openshift-eng:mainfrom
anuragthehatter:fix-ginkgo-reporter-stdout

Conversation

@anuragthehatter

Copy link
Copy Markdown

Summary

  • When running multiple tests in parallel, SpawnProcessToRunTest re-execs the binary with run-test --output=json and expects pure JSON on stdout
  • Ginkgo's default reporter writes its banner (Running Suite:..., SUCCESS!...) to stdout even though GinkgoWriter is redirected to stderr via ginkgo.NewWriter(os.Stderr) in configureGinkgo()
  • This non-JSON text precedes the JSON result and causes json.Unmarshal to fail with invalid character 'S' looking for beginning of value
  • Passing tests are reported as failures with a Deserialization Error
  • Fix: strip leading non-JSON bytes from stdout before unmarshalling

How to reproduce

Run any OTE extension with concurrency > 1:

./my-tests-ext run-test -n "test1" -n "test2" -c 2

All tests pass in Ginkgo (SUCCESS! -- 1 Passed) but the harness reports them as failed.

Test plan

  • Existing unit tests pass (go test ./pkg/ginkgo/)
  • Build passes (go build ./pkg/ginkgo/)
  • Verify with an OTE extension binary that previously showed false failures

🤖 Generated with Claude Code

When running multiple tests, SpawnProcessToRunTest re-execs the binary
with `run-test --output=json` and expects pure JSON on stdout. However,
Ginkgo's default reporter writes its banner ("Running Suite:...",
"SUCCESS!...") to stdout even though GinkgoWriter is redirected to
stderr. This non-JSON text precedes the JSON result and causes
json.Unmarshal to fail with "invalid character 'S' looking for
beginning of value", making passing tests appear as failures.

Fix by stripping any leading non-JSON bytes from stdout before
attempting to unmarshal. Also fix a minor format string bug where
%v was used instead of %d for an integer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci openshift-ci Bot requested review from smg247 and stbenjam June 15, 2026 20:14
@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: anuragthehatter
Once this PR has been reviewed and has the lgtm label, please assign smg247 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci

openshift-ci Bot commented Jun 15, 2026

Copy link
Copy Markdown

@anuragthehatter: This PR was included in a payload test run from openshift/ovn-kubernetes#3200
trigger 0 job(s) for the /payload-(with-prs|job|aggregate|job-with-prs|aggregate-with-prs) command

@stbenjam

stbenjam commented Jun 15, 2026

Copy link
Copy Markdown
Member

openshift/origin is the orchestrator in CI. Code here is used when OTE is standalone as a convenience or when using OTE outside of openshift-tests. You should confirm OTE is the source of the problem, are you seeing it locally in testing or in CI jobs?

@not-stbenjam

Copy link
Copy Markdown

I tried to reproduce the bug described here but couldn't find a code path where the Ginkgo reporter banner ends up on stdout.

What I checked:

  • RunSpec in vendor/.../ginkgo/v2/internal/suite_patch.go creates reporters.NewDefaultReporter(reporterConfig, writer) where writer is GetWriter() — which returns GinkgoWriter, reassigned to os.Stderr by configureGinkgo().
  • The reporter's _emit() method always writes to r.writer (the stderr-backed writer), never directly to os.Stdout.
  • Ginkgo's init() does initialize GinkgoWriter with os.Stdout, but configureGinkgo() is called both during BuildExtensionTestSpecsFromOpenShiftGinkgoSuite() and again inside the Run closure before RunSpec, so by the time the reporter runs, GinkgoWriter points to stderr.
  • Running ./example-tests run-test --output=json <test> with the main branch code produces clean JSON on stdout and the banner on stderr, consistently across 20+ runs.
  • Running run-test -n <test1> -n <test2> -c 2 (the parallel/SpawnProcessToRunTest path) also produces correct results with no deserialization errors on main.

Question: Could you provide a specific reproducer? The orchestration layers in origin may differ from example-tests — is the issue specific to how origin wires up the Ginkgo suite or spawns subprocesses? If so, noting which binary/suite triggers it would help confirm the fix.

The fix itself is defensive and reasonable, but I'd like to understand the actual trigger before adding integration tests for it.

@not-stbenjam

Copy link
Copy Markdown

Update after scanning openshift/origin:

Origin's pkg/test/extensions/binary.go has a different orchestration layer for external test binaries. Key finding:

  1. runWithTimeout() (line 953) uses c.CombinedOutput() — this mixes stdout and stderr into a single buffer. The Ginkgo reporter banner (which configureGinkgo() redirects to stderr) would contaminate the output when captured this way.

  2. Origin already handles this with its own extractJSON() function (line 349) that strips non-JSON prefix from output before parsing. This works for origin's external binary orchestration (RunTests at line 459).

  3. However, origin's extractJSON only protects the external binary path (RunTests/ListTests/Info). The OTE-internal parallel path (SpawnProcessToRunTest in parallel.go) uses separate stdout/stderr buffers, so the banner goes to stderr and stdout stays clean.

So the question remains: in what scenario does the Ginkgo banner actually end up on stdout in SpawnProcessToRunTest's subprocess? In my testing with example-tests, the subprocess's configureGinkgo() correctly redirects the reporter to stderr before RunSpec runs, and stdout contains only JSON.

Is this perhaps triggered by a specific extension binary that doesn't call BuildExtensionTestSpecsFromOpenShiftGinkgoSuite (and thus doesn't call configureGinkgo), or by a version of Ginkgo that writes to stdout through a different path?

@anuragthehatter

Copy link
Copy Markdown
Author

@stbenjam Thanks for replying. Not an expert in this area but got his from AI alibi.

The issue is NOT the Ginkgo reporter banner — that correctly goes to stderr via configureGinkgo(). Verified this by running the subprocess
  manually with stdout/stderr split:
  
  - stderr contains Running Suite:..., SUCCESS!... (reporter output) — correct
  - stdout starts with STEP: Creating a kubernetes client... and I0615 16:40:49... log lines (~1.3MB of test output), followed by the JSON result
  at the end

  The root cause is in openshift/onsi-ginkgo's suite_patch.go:61:
  suite.outputInterceptor = NoopOutputInterceptor{}
  
  RunSpec uses a no-op output interceptor, so it does NOT capture stdout/stderr from test code. The kube e2e framework, exutil.CLI, and g.By()
  calls write STEP: and log lines directly to os.Stdout during test execution. These go straight to the process's stdout and precede the JSON
  result, causing json.Unmarshal to fail on the 'S' in STEP:.
  suite.outputInterceptor = NoopOutputInterceptor{}

  RunSpec uses a no-op output interceptor, so it does NOT capture stdout/stderr from test code. The kube e2e framework, exutil.CLI, and g.By()
  calls write STEP: and log lines directly to os.Stdout during test execution. These go straight to the process's stdout and precede the JSON
  result, causing json.Unmarshal to fail on the 'S' in STEP:.

  In contrast, ginkgo.RunSpecs (the standard path) uses a real OutputInterceptor that captures stdout/stderr into CapturedStdOutErr. The
  example-tests likely don't have test helpers that write directly to os.Stdout, so their stdout stays clean.

  The fix should either strip non-JSON prefix (as in the PR) or use a real output interceptor in RunSpec.

I am also trying few runs on openshift/ovn-kubernetes#3200 to see if that specific failed cases issue is still a repro.

@stbenjam

stbenjam commented Jun 15, 2026

Copy link
Copy Markdown
Member

ovn is using quite old openshift-test-extension, last commit in January, I would try to revendor it. We also fixed some things in onsi-ginkgo around outptu: openshift/onsi-ginkgo#23

@anuragthehatter

Copy link
Copy Markdown
Author

Thanks @stbenjam Filed openshift/ovn-kubernetes#3250.
Looks like this PR can be closed.

@anuragthehatter

anuragthehatter commented Jun 18, 2026

Copy link
Copy Markdown
Author

@stbenjam Any thought on this PR wrt openshift/ovn-kubernetes#3200 (comment) ?

@openshift-ci

openshift-ci Bot commented Jun 28, 2026

Copy link
Copy Markdown

PR needs rebase.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-ci openshift-ci Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants