Skip to content

Fix agent env not overriding config#2832

Open
jotak wants to merge 2 commits into
netobserv:mainfrom
jotak:agent-env-priority
Open

Fix agent env not overriding config#2832
jotak wants to merge 2 commits into
netobserv:mainfrom
jotak:agent-env-priority

Conversation

@jotak

@jotak jotak commented Jun 26, 2026

Copy link
Copy Markdown
Member

Description

When agent envs are explicitely configured, they should always take priority over automatic configuration. E.g. TARGET_HOST could not be overriden by env.

Fixes #2831

Dependencies

n/a

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Summary by CodeRabbit

  • Bug Fixes
    • Improved how the eBPF agent’s environment configuration is assembled, making defaults and user overrides more consistent and predictable.
    • Expanded agent configuration coverage for connectivity, target/export settings (including gRPC export and reconnect), DNS, interface rules, flow filtering, and metrics/logging behavior.
    • Refined platform-specific defaults and conditional settings to better match the deployment environment.
  • Tests
    • Updated eBPF agent environment configuration tests to reflect the new behavior and expectations.

When agent envs are explicitely configured, they should always take
priority over automatic configuration. E.g. TARGET_HOST could not be
overriden by env.

Fixes netobserv#2831
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign mffiedler for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d32cd119-aea1-4407-bed2-da0473274d72

📥 Commits

Reviewing files that changed from the base of the PR and between 369675b and 9fa8933.

📒 Files selected for processing (1)
  • internal/controller/ebpf/agent_controller.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/controller/ebpf/agent_controller.go

📝 Walkthrough

Walkthrough

The PR rewrites eBPF agent environment-variable assembly into controller-level envConfig logic with sorted overrides, new default handling for export targets and feature toggles, and OpenShift-dependent attach mode selection. It also removes the shared BuildEnvFromDefaults helper and updates envConfig tests.

Changes

eBPF agent environment configuration

Layer / File(s) Summary
Controller env assembly
internal/controller/ebpf/agent_controller.go
envConfig now starts from advancedConfig.Env, sorts env vars by name, adds eBPF cache/log/interface/exclude/sampling and feature-toggle env vars, computes OpenShift-dependent attach mode, and uses addEnv for overrideable Kafka and gRPC env vars.
Env config test updates
internal/controller/ebpf/agent_controller_test.go
Tests now call AgentController.envConfig through reconcilers.Common, assert nil errors, and extend the expected env var sets for the new export, target, and reconnect values.
Shared helper removal
internal/pkg/helper/env.go, internal/pkg/helper/env_test.go
BuildEnvFromDefaults and its dedicated test are removed.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title matches the main change: letting explicit agent env vars override generated config.
Description check ✅ Passed The description states the problem, intended behavior, linked issue, dependencies, and QE choice, so it is mostly complete.
Linked Issues check ✅ Passed The code now lets explicit TARGET_HOST/TARGET_PORT env values override generated defaults, which addresses issue #2831's DNS suffix workaround.
Out of Scope Changes check ✅ Passed The refactor and test updates stay aligned with the env-override fix and do not introduce clear unrelated changes.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@jotak jotak added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 26, 2026
@github-actions

Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:369675b4
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-369675b4
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-369675b4

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:369675b4 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-369675b4

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-369675b4
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/controller/ebpf/agent_controller.go`:
- Around line 779-794: Wrap the watcher failures in the TLS setup path with
contextual error messages instead of returning raw errors. In the reconciliation
logic around c.Watcher.ProcessFileReference and
c.Watcher.ProcessMTLSCertsFromRefs, use the existing error values but add clear
context that identifies which TLS reference/certificate step failed before
returning from the controller.
- Around line 750-765: The gRPC defaults on this branch are being appended
directly, so advanced env overrides can still collide with generated values.
Update the logic in the agent controller path that builds `config` to route both
`envExport` and the host-network `envFlowsTargetHost` through the same
override-aware helper used elsewhere, instead of appending them unconditionally.
Keep the existing `addEnv(...)` pattern and make sure
`helper.GetAdvancedProcessorConfig(&coll.Spec)` and `advancedConfig.Env` are
consulted before adding these defaults.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1bdc5bd2-1f14-49c9-8e09-ddb933e39951

📥 Commits

Reviewing files that changed from the base of the PR and between 3337065 and 369675b.

📒 Files selected for processing (4)
  • internal/controller/ebpf/agent_controller.go
  • internal/controller/ebpf/agent_controller_test.go
  • internal/pkg/helper/env.go
  • internal/pkg/helper/env_test.go
💤 Files with no reviewable changes (2)
  • internal/pkg/helper/env.go
  • internal/pkg/helper/env_test.go

Comment thread internal/controller/ebpf/agent_controller.go
Comment on lines +779 to +794
caDigest, err := c.Watcher.ProcessFileReference(ctx, c.Client, *ca, c.PrivilegedNamespace())
if err != nil {
return nil, err
}
annots[watchers.Annotation("tls-ca")] = caDigest
}
} else {
certPath, keyPath := c.volumes.AddCertificate(clientCert, "client-certs")
config = addEnv(config, envTargetTLSUserCertPath, certPath, advancedConfig.Env)
config = addEnv(config, envTargetTLSUserKeyPath, keyPath, advancedConfig.Env)

if !assumeCertInstalled {
// Annotate pod with certificate reference so that it is reloaded if modified
caDigest, userDigest, err := c.Watcher.ProcessMTLSCertsFromRefs(ctx, c.Client, ca, clientCert, c.PrivilegedNamespace())
if err != nil {
return nil, err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Wrap TLS watcher errors with context.

These raw returns make reconcile failures harder to diagnose.

As per coding guidelines, "Wrap errors with context using proper error handling patterns in the controller reconciliation logic and error handling paths."

Proposed fix
 						caDigest, err := c.Watcher.ProcessFileReference(ctx, c.Client, *ca, c.PrivilegedNamespace())
 						if err != nil {
-							return nil, err
+							return nil, fmt.Errorf("process eBPF target TLS CA reference: %w", err)
 						}
 						annots[watchers.Annotation("tls-ca")] = caDigest
@@
 						caDigest, userDigest, err := c.Watcher.ProcessMTLSCertsFromRefs(ctx, c.Client, ca, clientCert, c.PrivilegedNamespace())
 						if err != nil {
-							return nil, err
+							return nil, fmt.Errorf("process eBPF target mTLS certificate references: %w", err)
 						}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
caDigest, err := c.Watcher.ProcessFileReference(ctx, c.Client, *ca, c.PrivilegedNamespace())
if err != nil {
return nil, err
}
annots[watchers.Annotation("tls-ca")] = caDigest
}
} else {
certPath, keyPath := c.volumes.AddCertificate(clientCert, "client-certs")
config = addEnv(config, envTargetTLSUserCertPath, certPath, advancedConfig.Env)
config = addEnv(config, envTargetTLSUserKeyPath, keyPath, advancedConfig.Env)
if !assumeCertInstalled {
// Annotate pod with certificate reference so that it is reloaded if modified
caDigest, userDigest, err := c.Watcher.ProcessMTLSCertsFromRefs(ctx, c.Client, ca, clientCert, c.PrivilegedNamespace())
if err != nil {
return nil, err
caDigest, err := c.Watcher.ProcessFileReference(ctx, c.Client, *ca, c.PrivilegedNamespace())
if err != nil {
return nil, fmt.Errorf("process eBPF target TLS CA reference: %w", err)
}
annots[watchers.Annotation("tls-ca")] = caDigest
}
} else {
certPath, keyPath := c.volumes.AddCertificate(clientCert, "client-certs")
config = addEnv(config, envTargetTLSUserCertPath, certPath, advancedConfig.Env)
config = addEnv(config, envTargetTLSUserKeyPath, keyPath, advancedConfig.Env)
if !assumeCertInstalled {
// Annotate pod with certificate reference so that it is reloaded if modified
caDigest, userDigest, err := c.Watcher.ProcessMTLSCertsFromRefs(ctx, c.Client, ca, clientCert, c.PrivilegedNamespace())
if err != nil {
return nil, fmt.Errorf("process eBPF target mTLS certificate references: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/controller/ebpf/agent_controller.go` around lines 779 - 794, Wrap
the watcher failures in the TLS setup path with contextual error messages
instead of returning raw errors. In the reconciliation logic around
c.Watcher.ProcessFileReference and c.Watcher.ProcessMTLSCertsFromRefs, use the
existing error values but add clear context that identifies which TLS
reference/certificate step failed before returning from the controller.

Source: Coding guidelines

@github-actions github-actions Bot removed the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Jun 26, 2026
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown

@jotak: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-operator 9fa8933 link false /test e2e-operator

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unable to run ebpf agents (in the default service mode) on clusters with non-default cluster dns suffix

1 participant