Skip to content

Local example with direct-flp + prom#949

Open
jpinsonneau wants to merge 1 commit into
netobserv:mainfrom
jpinsonneau:local-example
Open

Local example with direct-flp + prom#949
jpinsonneau wants to merge 1 commit into
netobserv:mainfrom
jpinsonneau:local-example

Conversation

@jpinsonneau

@jpinsonneau jpinsonneau commented Apr 17, 2026

Copy link
Copy Markdown
Member

Local run of eBPF agent using direct-flp mode and a dedicated prometheus instance

Generated-by: claude-4.6-opus-high

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive example demonstrating Prometheus metrics collection with direct-flp mode.
    • Features optional tracking for RTT, DNS, packet drops, IPsec, and TLS metrics.
    • Includes Docker Compose configuration, pre-configured recording rules, example alerts, and quick-start script.

@coderabbitai

coderabbitai Bot commented Apr 17, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a complete local Prometheus example for the direct-flp export mode, including documentation, FLP configuration with embedded metrics encoding, Docker Compose setup for Prometheus scraping, recording rules, alert definitions, and a run script to orchestrate the agent and Prometheus stack.

Changes

Cohort / File(s) Summary
Documentation
README.md, docs/config.md, examples/direct-flp/README.md, examples/direct-flp/prometheus-local/README.md
Added references to and comprehensive documentation of the new Prometheus local example, detailing RTT/DNS/packet-drop/IPsec/TLS features, embedded metrics on port 9102, and setup instructions.
Configuration Files
examples/direct-flp/prometheus-local/{flp-prometheus.yaml, prometheus.yml, docker-compose.yaml}
Added FLP Prometheus encoder config defining netobserv gauges with labels, Prometheus scrape and global settings targeting the embedded pipeline, and Docker Compose service for Prometheus (v2.53.3) with TSDB retention and lifecycle endpoint.
Prometheus Rules
examples/direct-flp/prometheus-local/netobserv-rules.yml
Added recording rules for packet drops, TCP RTT, DNS latency, flow bytes derivatives, IPsec/TLS aggregations, and alert rule NetobservNoFlowMetrics triggering on absent metrics.
Run Script
examples/direct-flp/prometheus-local/run-example.sh
Added executable shell script to launch the agent with direct-flp export, enable tracking features, configure FLP/Prometheus ports, and start Docker services.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is minimal and lacks required template sections (checklist items, QE requirements, dependency/testing information). Complete the PR description template by filling in the checklist, specifying QE requirements, and documenting any setup needed for testing the new example.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a local example demonstrating direct-flp mode with Prometheus monitoring.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Apr 17, 2026

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign jpinsonneau for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
examples/direct-flp/prometheus-local/netobserv-rules.yml (1)

24-26: deriv on a counter is misleading; prefer rate.

If netobserv_flow_bytes is a counter (bytes accumulated), deriv treats it as a gauge and will produce negative values on counter resets. Use rate(netobserv_flow_bytes[2m]) for a proper per-second rate. If it is actually a gauge, the current expression is fine — worth confirming and updating the comment accordingly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 24 -
26, The rule netobserv:flow_bytes:deriv_2m currently uses
deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric;
confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...)
with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline
comment to state it's a per-second rate for counters; if netobserv_flow_bytes is
actually a gauge, leave the expression but update the comment to indicate it’s
intended for gauges.
examples/direct-flp/prometheus-local/README.md (1)

60-60: Reconsider deriv() on gauge metrics.

Line 42 states these are gauges reflecting "the last value for each label set," not continuously increasing counters. deriv() assumes smoothly changing or monotonic values; applying it to gauges that reset or represent discrete flow snapshots produces unreliable results. Even labeled "noisy; exploratory," this could mislead users unfamiliar with the distinction.

Consider removing this recording rule or replacing it with a function suited to gauges (e.g., delta() or rate() only if the underlying metric were a counter, which it isn't).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/README.md` at line 60, The recording
rule that defines `netobserv:flow_bytes:deriv_2m` using
`deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or
remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the
expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate
function such as `delta(netobserv_flow_bytes[2m])` (or another chosen
aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 31-32: The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.

In `@examples/direct-flp/prometheus-local/run-example.sh`:
- Around line 24-27: Validate the flp-prometheus.yaml before starting the
compose stack and stop masking cat's exit status: check that
"${SCRIPT_DIR}/flp-prometheus.yaml" exists and is non-empty (e.g. [ -s
"${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo "missing"; exit 1; }), then read
it into FLP_CONFIG with a separate assignment that preserves failures
(FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" || exit 1) and then
export FLP_CONFIG; only after that run docker compose up -d; also add a trap on
ERR/EXIT to run docker compose down (or a cleanup function) so the compose stack
is torn down if the agent fails to start.

---

Nitpick comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 24-26: The rule netobserv:flow_bytes:deriv_2m currently uses
deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric;
confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...)
with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline
comment to state it's a per-second rate for counters; if netobserv_flow_bytes is
actually a gauge, leave the expression but update the comment to indicate it’s
intended for gauges.

In `@examples/direct-flp/prometheus-local/README.md`:
- Line 60: The recording rule that defines `netobserv:flow_bytes:deriv_2m` using
`deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or
remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the
expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate
function such as `delta(netobserv_flow_bytes[2m])` (or another chosen
aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6d2b2935-1195-4961-90ff-41ee266ac7d7

📥 Commits

Reviewing files that changed from the base of the PR and between 31041df and b3f61a9.

📒 Files selected for processing (9)
  • README.md
  • docs/config.md
  • examples/direct-flp/README.md
  • examples/direct-flp/prometheus-local/README.md
  • examples/direct-flp/prometheus-local/docker-compose.yaml
  • examples/direct-flp/prometheus-local/flp-prometheus.yaml
  • examples/direct-flp/prometheus-local/netobserv-rules.yml
  • examples/direct-flp/prometheus-local/prometheus.yml
  • examples/direct-flp/prometheus-local/run-example.sh

Comment on lines +31 to +32
- record: netobserv:tls_flow_bytes:count_by_version
expr: count by (TLSVersion) (netobserv_tls_flow_bytes)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

count by counts series, not bytes — name mismatch.

count by (TLSVersion) (netobserv_tls_flow_bytes) yields the number of active series per TLS version, not a byte-related quantity. If the intent is byte volume per version, use sum by (TLSVersion) like the IPSec rule above; otherwise rename the record to something like netobserv:tls_flows:count_by_version to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 31 -
32, The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.

Comment on lines +24 to +27
docker compose up -d

export EXPORT=direct-flp
export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Compose stack leaks if the agent fails to start, and cat errors are swallowed.

  • docker compose up -d runs before any validation of flp-prometheus.yaml; if cat/agent fails later, Prometheus keeps running and the user has to tear it down manually.
  • Line 27 hits shellcheck SC2155: export FLP_CONFIG="$(cat ...)" masks cat's exit status, so a missing/empty config file won't abort the script despite set -e.
Proposed fix
-cd "${SCRIPT_DIR}"
-docker compose up -d
-
-export EXPORT=direct-flp
-export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+cd "${SCRIPT_DIR}"
+
+FLP_CONFIG_CONTENT="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+export FLP_CONFIG="${FLP_CONFIG_CONTENT}"
+export EXPORT=direct-flp
+
+docker compose up -d
+trap 'docker compose -f "${SCRIPT_DIR}/docker-compose.yaml" down' EXIT
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 27-27: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/run-example.sh` around lines 24 - 27,
Validate the flp-prometheus.yaml before starting the compose stack and stop
masking cat's exit status: check that "${SCRIPT_DIR}/flp-prometheus.yaml" exists
and is non-empty (e.g. [ -s "${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo
"missing"; exit 1; }), then read it into FLP_CONFIG with a separate assignment
that preserves failures (FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
|| exit 1) and then export FLP_CONFIG; only after that run docker compose up -d;
also add a trap on ERR/EXIT to run docker compose down (or a cleanup function)
so the compose stack is torn down if the agent fails to start.

@openshift-ci

openshift-ci Bot commented Apr 17, 2026

Copy link
Copy Markdown

@jpinsonneau: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/qe-e2e-tests b3f61a9 link false /test qe-e2e-tests

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant