Local example with direct-flp + prom#949
Conversation
📝 WalkthroughWalkthroughAdds a complete local Prometheus example for the direct-flp export mode, including documentation, FLP configuration with embedded metrics encoding, Docker Compose setup for Prometheus scraping, recording rules, alert definitions, and a run script to orchestrate the agent and Prometheus stack. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
examples/direct-flp/prometheus-local/netobserv-rules.yml (1)
24-26:derivon a counter is misleading; preferrate.If
netobserv_flow_bytesis a counter (bytes accumulated),derivtreats it as a gauge and will produce negative values on counter resets. Userate(netobserv_flow_bytes[2m])for a proper per-second rate. If it is actually a gauge, the current expression is fine — worth confirming and updating the comment accordingly.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 24 - 26, The rule netobserv:flow_bytes:deriv_2m currently uses deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric; confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...) with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline comment to state it's a per-second rate for counters; if netobserv_flow_bytes is actually a gauge, leave the expression but update the comment to indicate it’s intended for gauges.examples/direct-flp/prometheus-local/README.md (1)
60-60: Reconsiderderiv()on gauge metrics.Line 42 states these are gauges reflecting "the last value for each label set," not continuously increasing counters.
deriv()assumes smoothly changing or monotonic values; applying it to gauges that reset or represent discrete flow snapshots produces unreliable results. Even labeled "noisy; exploratory," this could mislead users unfamiliar with the distinction.Consider removing this recording rule or replacing it with a function suited to gauges (e.g.,
delta()orrate()only if the underlying metric were a counter, which it isn't).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/direct-flp/prometheus-local/README.md` at line 60, The recording rule that defines `netobserv:flow_bytes:deriv_2m` using `deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate function such as `delta(netobserv_flow_bytes[2m])` (or another chosen aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 31-32: The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.
In `@examples/direct-flp/prometheus-local/run-example.sh`:
- Around line 24-27: Validate the flp-prometheus.yaml before starting the
compose stack and stop masking cat's exit status: check that
"${SCRIPT_DIR}/flp-prometheus.yaml" exists and is non-empty (e.g. [ -s
"${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo "missing"; exit 1; }), then read
it into FLP_CONFIG with a separate assignment that preserves failures
(FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" || exit 1) and then
export FLP_CONFIG; only after that run docker compose up -d; also add a trap on
ERR/EXIT to run docker compose down (or a cleanup function) so the compose stack
is torn down if the agent fails to start.
---
Nitpick comments:
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml`:
- Around line 24-26: The rule netobserv:flow_bytes:deriv_2m currently uses
deriv(netobserv_flow_bytes[2m]) which is misleading for a counter metric;
confirm whether netobserv_flow_bytes is a counter and if so replace deriv(...)
with rate(netobserv_flow_bytes[2m]) in the recording rule and update the inline
comment to state it's a per-second rate for counters; if netobserv_flow_bytes is
actually a gauge, leave the expression but update the comment to indicate it’s
intended for gauges.
In `@examples/direct-flp/prometheus-local/README.md`:
- Line 60: The recording rule that defines `netobserv:flow_bytes:deriv_2m` using
`deriv(netobserv_flow_bytes[2m])` is inappropriate for a gauge; replace or
remove it: either delete the `netobserv:flow_bytes:deriv_2m` rule or change the
expression from `deriv(netobserv_flow_bytes[2m])` to a gauge-appropriate
function such as `delta(netobserv_flow_bytes[2m])` (or another chosen
aggregation) so you don't apply `deriv()` to a non-monotonic gauge metric.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6d2b2935-1195-4961-90ff-41ee266ac7d7
📒 Files selected for processing (9)
README.mddocs/config.mdexamples/direct-flp/README.mdexamples/direct-flp/prometheus-local/README.mdexamples/direct-flp/prometheus-local/docker-compose.yamlexamples/direct-flp/prometheus-local/flp-prometheus.yamlexamples/direct-flp/prometheus-local/netobserv-rules.ymlexamples/direct-flp/prometheus-local/prometheus.ymlexamples/direct-flp/prometheus-local/run-example.sh
| - record: netobserv:tls_flow_bytes:count_by_version | ||
| expr: count by (TLSVersion) (netobserv_tls_flow_bytes) |
There was a problem hiding this comment.
count by counts series, not bytes — name mismatch.
count by (TLSVersion) (netobserv_tls_flow_bytes) yields the number of active series per TLS version, not a byte-related quantity. If the intent is byte volume per version, use sum by (TLSVersion) like the IPSec rule above; otherwise rename the record to something like netobserv:tls_flows:count_by_version to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 31 -
32, The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.
| docker compose up -d | ||
|
|
||
| export EXPORT=direct-flp | ||
| export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" |
There was a problem hiding this comment.
Compose stack leaks if the agent fails to start, and cat errors are swallowed.
docker compose up -druns before any validation offlp-prometheus.yaml; ifcat/agent fails later, Prometheus keeps running and the user has to tear it down manually.- Line 27 hits shellcheck SC2155:
export FLP_CONFIG="$(cat ...)"maskscat's exit status, so a missing/empty config file won't abort the script despiteset -e.
Proposed fix
-cd "${SCRIPT_DIR}"
-docker compose up -d
-
-export EXPORT=direct-flp
-export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+cd "${SCRIPT_DIR}"
+
+FLP_CONFIG_CONTENT="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+export FLP_CONFIG="${FLP_CONFIG_CONTENT}"
+export EXPORT=direct-flp
+
+docker compose up -d
+trap 'docker compose -f "${SCRIPT_DIR}/docker-compose.yaml" down' EXIT🧰 Tools
🪛 Shellcheck (0.11.0)
[warning] 27-27: Declare and assign separately to avoid masking return values.
(SC2155)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/direct-flp/prometheus-local/run-example.sh` around lines 24 - 27,
Validate the flp-prometheus.yaml before starting the compose stack and stop
masking cat's exit status: check that "${SCRIPT_DIR}/flp-prometheus.yaml" exists
and is non-empty (e.g. [ -s "${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo
"missing"; exit 1; }), then read it into FLP_CONFIG with a separate assignment
that preserves failures (FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
|| exit 1) and then export FLP_CONFIG; only after that run docker compose up -d;
also add a trap on ERR/EXIT to run docker compose down (or a cleanup function)
so the compose stack is torn down if the agent fails to start.
|
@jpinsonneau: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Local run of eBPF agent using direct-flp mode and a dedicated prometheus instance
Generated-by:
claude-4.6-opus-highSummary by CodeRabbit