Local example with direct-flp + prom by jpinsonneau · Pull Request #949 · netobserv/netobserv-ebpf-agent

jpinsonneau · 2026-04-17T10:21:03Z

Local run of eBPF agent using direct-flp mode and a dedicated prometheus instance

Generated-by: claude-4.6-opus-high

Summary by CodeRabbit

Documentation
- Added a comprehensive example demonstrating Prometheus metrics collection with direct-flp mode.
- Features optional tracking for RTT, DNS, packet drops, IPsec, and TLS metrics.
- Includes Docker Compose configuration, pre-configured recording rules, example alerts, and quick-start script.

coderabbitai · 2026-04-17T10:21:17Z

📝 Walkthrough

Walkthrough

Adds a complete local Prometheus example for the direct-flp export mode, including documentation, FLP configuration with embedded metrics encoding, Docker Compose setup for Prometheus scraping, recording rules, alert definitions, and a run script to orchestrate the agent and Prometheus stack.

Changes

Cohort / File(s)	Summary
Documentation `README.md`, `docs/config.md`, `examples/direct-flp/README.md`, `examples/direct-flp/prometheus-local/README.md`	Added references to and comprehensive documentation of the new Prometheus local example, detailing RTT/DNS/packet-drop/IPsec/TLS features, embedded metrics on port 9102, and setup instructions.
Configuration Files `examples/direct-flp/prometheus-local/{flp-prometheus.yaml, prometheus.yml, docker-compose.yaml}`	Added FLP Prometheus encoder config defining netobserv gauges with labels, Prometheus scrape and global settings targeting the embedded pipeline, and Docker Compose service for Prometheus (v2.53.3) with TSDB retention and lifecycle endpoint.
Prometheus Rules `examples/direct-flp/prometheus-local/netobserv-rules.yml`	Added recording rules for packet drops, TCP RTT, DNS latency, flow bytes derivatives, IPsec/TLS aggregations, and alert rule `NetobservNoFlowMetrics` triggering on absent metrics.
Run Script `examples/direct-flp/prometheus-local/run-example.sh`	Added executable shell script to launch the agent with direct-flp export, enable tracking features, configure FLP/Prometheus ports, and start Docker services.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is minimal and lacks required template sections (checklist items, QE requirements, dependency/testing information).	Complete the PR description template by filling in the checklist, specifying QE requirements, and documenting any setup needed for testing the new example.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: adding a local example demonstrating direct-flp mode with Prometheus monitoring.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-17T10:21:30Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jpinsonneau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

DOWNSTREAM_OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

examples/direct-flp/prometheus-local/netobserv-rules.yml (1)
24-26: deriv on a counter is misleading; prefer rate.

If netobserv_flow_bytes is a counter (bytes accumulated), deriv treats it as a gauge and will produce negative values on counter resets. Use rate(netobserv_flow_bytes[2m]) for a proper per-second rate. If it is actually a gauge, the current expression is fine — worth confirming and updating the comment accordingly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 24 -
26, The rule netobserv:flow_bytes:deriv_2m currently uses
deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric;
confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...)
with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline
comment to state it's a per-second rate for counters; if netobserv_flow_bytes is
actually a gauge, leave the expression but update the comment to indicate it’s
intended for gauges.
examples/direct-flp/prometheus-local/README.md (1)
60-60: Reconsider deriv() on gauge metrics.

Line 42 states these are gauges reflecting "the last value for each label set," not continuously increasing counters. deriv() assumes smoothly changing or monotonic values; applying it to gauges that reset or represent discrete flow snapshots produces unreliable results. Even labeled "noisy; exploratory," this could mislead users unfamiliar with the distinction.

Consider removing this recording rule or replacing it with a function suited to gauges (e.g., delta() or rate() only if the underlying metric were a counter, which it isn't).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/README.md` at line 60, The recording
rule that defines `netobserv:flow_bytes:deriv_2m` using
`deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or
remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the
expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate
function such as `delta(netobserv_flow_bytes[2m])` (or another chosen
aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 31-32: The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.

In `@examples/direct-flp/prometheus-local/run-example.sh`:
- Around line 24-27: Validate the flp-prometheus.yaml before starting the
compose stack and stop masking cat's exit status: check that
"${SCRIPT_DIR}/flp-prometheus.yaml" exists and is non-empty (e.g. [ -s
"${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo "missing"; exit 1; }), then read
it into FLP_CONFIG with a separate assignment that preserves failures
(FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" || exit 1) and then
export FLP_CONFIG; only after that run docker compose up -d; also add a trap on
ERR/EXIT to run docker compose down (or a cleanup function) so the compose stack
is torn down if the agent fails to start.

---

Nitpick comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 24-26: The rule netobserv:flow_bytes:deriv_2m currently uses
deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric;
confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...)
with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline
comment to state it's a per-second rate for counters; if netobserv_flow_bytes is
actually a gauge, leave the expression but update the comment to indicate it’s
intended for gauges.

In `@examples/direct-flp/prometheus-local/README.md`:
- Line 60: The recording rule that defines `netobserv:flow_bytes:deriv_2m` using
`deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or
remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the
expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate
function such as `delta(netobserv_flow_bytes[2m])` (or another chosen
aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d2b2935-1195-4961-90ff-41ee266ac7d7

📥 Commits

Reviewing files that changed from the base of the PR and between 31041df and b3f61a9.

📒 Files selected for processing (9)

README.md
docs/config.md
examples/direct-flp/README.md
examples/direct-flp/prometheus-local/README.md
examples/direct-flp/prometheus-local/docker-compose.yaml
examples/direct-flp/prometheus-local/flp-prometheus.yaml
examples/direct-flp/prometheus-local/netobserv-rules.yml
examples/direct-flp/prometheus-local/prometheus.yml
examples/direct-flp/prometheus-local/run-example.sh

coderabbitai · 2026-04-17T10:27:33Z

+      - record: netobserv:tls_flow_bytes:count_by_version
+        expr: count by (TLSVersion) (netobserv_tls_flow_bytes)


⚠️ Potential issue | 🟡 Minor

count by counts series, not bytes — name mismatch.

count by (TLSVersion) (netobserv_tls_flow_bytes) yields the number of active series per TLS version, not a byte-related quantity. If the intent is byte volume per version, use sum by (TLSVersion) like the IPSec rule above; otherwise rename the record to something like netobserv:tls_flows:count_by_version to avoid confusion.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 31 - 32, The metric name/semantics mismatch: the recording rule "netobserv:tls_flow_bytes:count_by_version" uses the expression "count by (TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing bytes. Fix by either changing the expression to "sum by (TLSVersion) (netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the record to something like "netobserv:tls_flows:count_by_version" to reflect it is a series count; update the recording rule name "netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.

coderabbitai · 2026-04-17T10:27:33Z

+docker compose up -d
+
+export EXPORT=direct-flp
+export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"


⚠️ Potential issue | 🟡 Minor

Compose stack leaks if the agent fails to start, and cat errors are swallowed.

docker compose up -d runs before any validation of flp-prometheus.yaml; if cat/agent fails later, Prometheus keeps running and the user has to tear it down manually.

Line 27 hits shellcheck SC2155: export FLP_CONFIG="$(cat ...)" masks cat's exit status, so a missing/empty config file won't abort the script despite set -e.

Proposed fix

-cd "${SCRIPT_DIR}" -docker compose up -d - -export EXPORT=direct-flp -export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" +cd "${SCRIPT_DIR}" + +FLP_CONFIG_CONTENT="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" +export FLP_CONFIG="${FLP_CONFIG_CONTENT}" +export EXPORT=direct-flp + +docker compose up -d +trap 'docker compose -f "${SCRIPT_DIR}/docker-compose.yaml" down' EXIT

🧰 Tools

🪛 Shellcheck (0.11.0)

[warning] 27-27: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/direct-flp/prometheus-local/run-example.sh` around lines 24 - 27, Validate the flp-prometheus.yaml before starting the compose stack and stop masking cat's exit status: check that "${SCRIPT_DIR}/flp-prometheus.yaml" exists and is non-empty (e.g. [ -s "${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo "missing"; exit 1; }), then read it into FLP_CONFIG with a separate assignment that preserves failures (FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" || exit 1) and then export FLP_CONFIG; only after that run docker compose up -d; also add a trap on ERR/EXIT to run docker compose down (or a cleanup function) so the compose stack is torn down if the agent fails to start.

openshift-ci · 2026-04-17T18:56:38Z

@jpinsonneau: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/qe-e2e-tests	`b3f61a9`	link	false	`/test qe-e2e-tests`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

local example with direct-flp + prom

b3f61a9

coderabbitai Bot reviewed Apr 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Local example with direct-flp + prom#949

Local example with direct-flp + prom#949
jpinsonneau wants to merge 1 commit into
netobserv:mainfrom
jpinsonneau:local-example

jpinsonneau commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Uh oh!

coderabbitai Bot Apr 17, 2026

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		- record: netobserv:tls_flow_bytes:count_by_version
		expr: count by (TLSVersion) (netobserv_tls_flow_bytes)

Uh oh!

Conversation

jpinsonneau commented Apr 17, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci Bot commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jpinsonneau commented Apr 17, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 17, 2026 •

edited

Loading