-
Notifications
You must be signed in to change notification settings - Fork 61
Local example with direct-flp + prom #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jpinsonneau
wants to merge
1
commit into
netobserv:main
Choose a base branch
from
jpinsonneau:local-example
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,112 @@ | ||
| # direct-flp with Prometheus (local lab) | ||
|
|
||
| This example runs the NetObserv eBPF agent on the host with **`EXPORT=direct-flp`**, embeds [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) using **`FLP_CONFIG`** from [`flp-prometheus.yaml`](./flp-prometheus.yaml), and starts a **Prometheus** container that scrapes the embedded pipeline’s `/metrics` endpoint. | ||
|
|
||
| Optional features are turned on in the runner script for demonstration: **RTT** (`ENABLE_RTT`), **DNS tracking** (`ENABLE_DNS_TRACKING`), **packet drop tracking** (`ENABLE_PKT_DROPS`), **IPsec flow metadata** (`ENABLE_IPSEC_TRACKING`), and **TLS metadata per flow** (`ENABLE_TLS_TRACKING`). See [docs/config.md](../../../docs/config.md) and [pkg/config/config.go](../../../pkg/config/config.go) for all agent environment variables. | ||
|
|
||
| IPsec and TLS hooks add eBPF work and only populate fields when matching traffic exists on the traced interfaces; they are enabled here so the corresponding `netobserv_*` series appear in Prometheus when you exercise those protocols. This is **TLS metadata from the agent’s eBPF TLS tracker** (`ENABLE_TLS_TRACKING`), not Kafka or gRPC TLS. Userspace OpenSSL correlation uses `ENABLE_OPENSSL_TRACKING` separately and is not turned on in this script. | ||
|
|
||
| ## Quick start | ||
|
|
||
| From the repository root: | ||
|
|
||
| ```bash | ||
| make compile | ||
| ./examples/direct-flp/prometheus-local/run-example.sh | ||
| ``` | ||
|
|
||
| - **Prometheus UI:** http://127.0.0.1:9091 | ||
| - **Embedded FLP metrics:** http://127.0.0.1:9102/metrics | ||
| - **Agent operational metrics:** http://127.0.0.1:9090/metrics (default `METRICS_SERVER_PORT`) | ||
|
|
||
| Stop Prometheus: `docker compose -f examples/direct-flp/prometheus-local/docker-compose.yaml down` | ||
|
|
||
| ## Why two ports (9090 and 9102) | ||
|
|
||
| The agent exposes its own Prometheus listener on **`METRICS_SERVER_PORT`** (default **9090**). The embedded flowlogs-pipeline also starts a global metrics HTTP server; if its port is left unset it defaults to **9090** as well and would conflict. [`flp-prometheus.yaml`](./flp-prometheus.yaml) sets **`metricsSettings.port: 9102`** so flow encode metrics are served separately. | ||
|
|
||
| ## Flow metrics exposed by FLP (`flp-prometheus.yaml`) | ||
|
|
||
| All names use the encode prefix **`netobserv_`** (see `prom.prefix` in the file). | ||
|
|
||
| | Prometheus metric | Type | When populated | Main labels | | ||
| |-------------------|------|----------------|-------------| | ||
| | `netobserv_flow_bytes` | gauge | `Bytes` present on the flow | `SrcAddr`, `DstAddr`, `SrcPort`, `DstPort`, `Proto` | | ||
| | `netobserv_flow_packets` | gauge | `Packets` present | same as above | | ||
| | `netobserv_dns_latency_ms` | gauge | DNS tracking enabled, `DnsLatencyMs` present | `SrcAddr`, `DstAddr`, `DstPort`, `DnsFlagsResponseCode` | | ||
| | `netobserv_tcp_flow_rtt_ns` | gauge | RTT enabled, `TimeFlowRttNs` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto` | | ||
| | `netobserv_pkt_drop_packets` | gauge | Packet drop tracking enabled, `PktDropPackets` present | `SrcAddr`, `DstAddr`, `PktDropLatestDropCause` | | ||
| | `netobserv_ipsec_flow_bytes` | gauge | IPsec tracking enabled; `IPSecStatus` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `IPSecStatus`, `IPSecRetCode` | | ||
| | `netobserv_tls_flow_bytes` | gauge | TLS tracking enabled; `TLSVersion` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `TLSVersion`, `TLSCipherSuite` | | ||
|
|
||
| These are **gauges** reflecting the last value for each label set (with FLP `expiryTime: 5m` cleaning stale series). They are not counters; use aggregations such as `max_over_time` or `topk` rather than `rate()` unless you understand the semantics. | ||
|
|
||
| **Cardinality:** labels include IP addresses and ports. This is appropriate for a **local lab** only, not for a large production Prometheus. | ||
|
|
||
| **Packet drops:** the agent needs access to kernel tracepoints (typically **`/sys/kernel/debug`** mounted and sufficient privileges). See the main [README.md](../../../README.md) and [docs/config.md](../../../docs/config.md). | ||
|
|
||
| ## Prometheus rules (`netobserv-rules.yml`) | ||
|
|
||
| [`prometheus.yml`](./prometheus.yml) loads [`netobserv-rules.yml`](./netobserv-rules.yml) via `rule_files`. | ||
|
|
||
| ### Recording rules (precomputed series) | ||
|
|
||
| | Recorded metric | Expression (summary) | | ||
| |-----------------|----------------------| | ||
| | `netobserv:pkt_drop_packets:sum_by_cause` | `sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)` | | ||
| | `netobserv:tcp_flow_rtt:milliseconds` | `netobserv_tcp_flow_rtt_ns / 1e6` | | ||
| | `netobserv:dns_latency_ms:max_over_5m` | `max_over_time(netobserv_dns_latency_ms[5m])` | | ||
| | `netobserv:flow_bytes:top10` | `topk(10, netobserv_flow_bytes)` | | ||
| | `netobserv:flow_bytes:deriv_2m` | `deriv(netobserv_flow_bytes[2m])` (noisy; exploratory) | | ||
| | `netobserv:ipsec_flow_bytes:sum_by_status` | `sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)` | | ||
| | `netobserv:tls_flow_bytes:count_by_version` | `count by (TLSVersion) (netobserv_tls_flow_bytes)` | | ||
|
|
||
| In the Prometheus UI (**Graph**), you can query these names directly instead of typing the full expression. | ||
|
|
||
| ### Example alert | ||
|
|
||
| | Alert | Condition | Meaning | | ||
| |-------|-----------|---------| | ||
| | `NetobservNoFlowMetrics` | `absent(netobserv_flow_bytes)` for 3m | No `netobserv_flow_bytes` series (scrape failure, agent stopped, or no flows exported yet). Check the agent, http://127.0.0.1:9102/metrics on the host, and Docker `host.docker.internal` reachability. | | ||
|
|
||
| View rule evaluation status under **Status → Rules** and firing alerts under **Alerts**. | ||
|
|
||
| ## Ad hoc PromQL (raw FLP metrics) | ||
|
|
||
| ```promql | ||
| {__name__=~"netobserv_.*"} | ||
| ``` | ||
|
|
||
| ```promql | ||
| topk(10, netobserv_flow_bytes) | ||
| ``` | ||
|
|
||
| ```promql | ||
| netobserv_tcp_flow_rtt_ns / 1e6 | ||
| ``` | ||
|
|
||
| ```promql | ||
| sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets) | ||
| ``` | ||
|
|
||
| ```promql | ||
| netobserv_ipsec_flow_bytes | ||
| ``` | ||
|
|
||
| ```promql | ||
| topk(5, netobserv_tls_flow_bytes) | ||
| ``` | ||
|
|
||
| ## Files in this directory | ||
|
|
||
| | File | Role | | ||
| |------|------| | ||
| | [`flp-prometheus.yaml`](./flp-prometheus.yaml) | `FLP_CONFIG`: `metricsSettings` + `encode/prom` pipeline after `preset-ingester` | | ||
| | [`prometheus.yml`](./prometheus.yml) | Prometheus scrape config + `rule_files` | | ||
| | [`netobserv-rules.yml`](./netobserv-rules.yml) | Recording rules and example alert | | ||
| | [`docker-compose.yaml`](./docker-compose.yaml) | Prometheus service and config mounts | | ||
| | [`run-example.sh`](./run-example.sh) | Starts Compose, exports env vars, runs the agent with `sudo -E` | | ||
|
|
||
| ## Prometheus without Docker | ||
|
|
||
| Point a local `prometheus.yml` at `127.0.0.1:9102` instead of `host.docker.internal:9102`, and use the same `rule_files` entry with a path to [`netobserv-rules.yml`](./netobserv-rules.yml) on your machine. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| services: | ||
| prometheus: | ||
| image: docker.io/prom/prometheus:v2.53.3 | ||
| ports: | ||
| # Host UI/API; agent + embedded FLP keep 9090 and 9102 on the host. | ||
| - "9091:9090" | ||
| volumes: | ||
| - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro | ||
| - ./netobserv-rules.yml:/etc/prometheus/netobserv-rules.yml:ro | ||
| command: | ||
| - --config.file=/etc/prometheus/prometheus.yml | ||
| - --storage.tsdb.retention.time=2h | ||
| - --web.enable-lifecycle | ||
| extra_hosts: | ||
| - "host.docker.internal:host-gateway" |
111 changes: 111 additions & 0 deletions
111
examples/direct-flp/prometheus-local/flp-prometheus.yaml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,111 @@ | ||
| # Flowlogs-pipeline config for EXPORT=direct-flp (ingest is provided by the agent). | ||
| # Exposes Prometheus metrics on :9102/metrics (see metricsSettings). | ||
| # Agent operational metrics stay on METRICS_SERVER_PORT (default 9090). | ||
| log-level: info | ||
| metricsSettings: | ||
| address: "0.0.0.0" | ||
| port: 9102 | ||
| pipeline: | ||
| - name: encode_prom | ||
| follows: preset-ingester | ||
| parameters: | ||
| - name: encode_prom | ||
| encode: | ||
| type: prom | ||
| prom: | ||
| prefix: netobserv_ | ||
| expiryTime: 5m | ||
| metrics: | ||
| - name: flow_bytes | ||
| type: gauge | ||
| help: Last observed byte count for a flow (high-cardinality labels; local lab only). | ||
| filters: | ||
| - key: Bytes | ||
| type: presence | ||
| valueKey: Bytes | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - SrcPort | ||
| - DstPort | ||
| - Proto | ||
| - name: flow_packets | ||
| type: gauge | ||
| help: Last observed packet count for a flow. | ||
| filters: | ||
| - key: Packets | ||
| type: presence | ||
| valueKey: Packets | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - SrcPort | ||
| - DstPort | ||
| - Proto | ||
| - name: dns_latency_ms | ||
| type: gauge | ||
| help: DNS request/response latency in milliseconds when DNS tracking is enabled. | ||
| filters: | ||
| - key: DnsLatencyMs | ||
| type: presence | ||
| valueKey: DnsLatencyMs | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - DstPort | ||
| - DnsFlagsResponseCode | ||
| - name: tcp_flow_rtt_ns | ||
| type: gauge | ||
| help: TCP smoothed RTT in nanoseconds when RTT tracking is enabled. | ||
| filters: | ||
| - key: TimeFlowRttNs | ||
| type: presence | ||
| valueKey: TimeFlowRttNs | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - DstPort | ||
| - Proto | ||
| - name: pkt_drop_packets | ||
| type: gauge | ||
| help: Kernel packet drops attributed to the flow when packet drop tracking is enabled. | ||
| filters: | ||
| - key: PktDropPackets | ||
| type: presence | ||
| valueKey: PktDropPackets | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - PktDropLatestDropCause | ||
| - name: ipsec_flow_bytes | ||
| type: gauge | ||
| help: Last observed bytes for flows where IPsec encryption was detected (ENABLE_IPSEC_TRACKING). | ||
| filters: | ||
| - key: Bytes | ||
| type: presence | ||
| - key: IPSecStatus | ||
| type: presence | ||
| valueKey: Bytes | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - DstPort | ||
| - Proto | ||
| - IPSecStatus | ||
| - IPSecRetCode | ||
| - name: tls_flow_bytes | ||
| type: gauge | ||
| help: Last observed bytes for flows where TLS metadata was captured (ENABLE_TLS_TRACKING). | ||
| filters: | ||
| - key: Bytes | ||
| type: presence | ||
| - key: TLSVersion | ||
| type: presence | ||
| valueKey: Bytes | ||
| labels: | ||
| - SrcAddr | ||
| - DstAddr | ||
| - DstPort | ||
| - Proto | ||
| - TLSVersion | ||
| - TLSCipherSuite |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,44 @@ | ||
| # Recording + alert rules for the netobserv direct-flp / prom-local example. | ||
| # After reload, use Graph → metric name (e.g. netobserv:tcp_flow_rtt:milliseconds). | ||
|
|
||
| groups: | ||
| - name: netobserv-recording | ||
| interval: 15s | ||
| rules: | ||
| # sum of drop packets attributed to flows, grouped by last kernel drop cause | ||
| - record: netobserv:pkt_drop_packets:sum_by_cause | ||
| expr: sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets) | ||
|
|
||
| # TCP SRTT in milliseconds (source series is nanoseconds) | ||
| - record: netobserv:tcp_flow_rtt:milliseconds | ||
| expr: netobserv_tcp_flow_rtt_ns / 1e6 | ||
|
|
||
| # per-series max DNS latency over 5m (requires ≥10s scrape; matches global interval) | ||
| - record: netobserv:dns_latency_ms:max_over_5m | ||
| expr: max_over_time(netobserv_dns_latency_ms[5m]) | ||
|
|
||
| # up to 10 flows with largest last-reported byte gauge (high-cardinality labels preserved) | ||
| - record: netobserv:flow_bytes:top10 | ||
| expr: topk(10, netobserv_flow_bytes) | ||
|
|
||
| # optional: rough “change” signal for byte gauges (can be noisy) | ||
| - record: netobserv:flow_bytes:deriv_2m | ||
| expr: deriv(netobserv_flow_bytes[2m]) | ||
|
|
||
| - record: netobserv:ipsec_flow_bytes:sum_by_status | ||
| expr: sum by (IPSecStatus) (netobserv_ipsec_flow_bytes) | ||
|
|
||
| - record: netobserv:tls_flow_bytes:count_by_version | ||
| expr: count by (TLSVersion) (netobserv_tls_flow_bytes) | ||
|
|
||
| - name: netobserv-alerts | ||
| rules: | ||
| # Fires when no netobserv flow byte series exist (scrape down, agent stopped, or no traffic yet). | ||
| - alert: NetobservNoFlowMetrics | ||
| expr: absent(netobserv_flow_bytes) | ||
| for: 3m | ||
| labels: | ||
| severity: warning | ||
| annotations: | ||
| summary: No netobserv_flow_bytes series scraped | ||
| description: Prometheus has not recorded netobserv_flow_bytes in 3m. Check agent, :9102/metrics, and host.docker.internal from the container. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,13 @@ | ||
| global: | ||
| scrape_interval: 10s | ||
| evaluation_interval: 10s | ||
|
|
||
| rule_files: | ||
| - /etc/prometheus/netobserv-rules.yml | ||
|
|
||
| scrape_configs: | ||
| - job_name: netobserv-flowlogs-pipeline | ||
| metrics_path: /metrics | ||
| static_configs: | ||
| # From inside the Prometheus container, reach the host where the agent listens (9102). | ||
| - targets: ["host.docker.internal:9102"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,43 @@ | ||
| #!/usr/bin/env bash | ||
| # Run NetObserv eBPF agent on the host with direct-flp + Prometheus encode, alongside | ||
| # a Prometheus instance in Docker that scrapes the embedded flowlogs-pipeline /metrics. | ||
| # | ||
| # Prerequisites: | ||
| # - Built agent: (cd repo root && make compile) | ||
| # - Docker with compose plugin | ||
| # - Caps: sudo, or CAP_BPF + CAP_PERFMON + CAP_NET_ADMIN (see README) | ||
| # - Packet drops: /sys/kernel/debug mounted read-write (often requires privileged / root) | ||
| # - IPsec/TLS: ENABLE_IPSEC_TRACKING / ENABLE_TLS_TRACKING (extra eBPF hooks; series only if traffic matches) | ||
| # | ||
| set -euo pipefail | ||
|
|
||
| SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" | ||
| REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)" | ||
| AGENT_BIN="${REPO_ROOT}/bin/netobserv-ebpf-agent" | ||
|
|
||
| if [[ ! -x "${AGENT_BIN}" ]]; then | ||
| echo "Missing ${AGENT_BIN}. Run: cd ${REPO_ROOT} && make compile" >&2 | ||
| exit 1 | ||
| fi | ||
|
|
||
| cd "${SCRIPT_DIR}" | ||
| docker compose up -d | ||
|
|
||
| export EXPORT=direct-flp | ||
| export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" | ||
|
Comment on lines
+24
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Compose stack leaks if the agent fails to start, and
Proposed fix-cd "${SCRIPT_DIR}"
-docker compose up -d
-
-export EXPORT=direct-flp
-export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+cd "${SCRIPT_DIR}"
+
+FLP_CONFIG_CONTENT="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+export FLP_CONFIG="${FLP_CONFIG_CONTENT}"
+export EXPORT=direct-flp
+
+docker compose up -d
+trap 'docker compose -f "${SCRIPT_DIR}/docker-compose.yaml" down' EXIT🧰 Tools🪛 Shellcheck (0.11.0)[warning] 27-27: Declare and assign separately to avoid masking return values. (SC2155) 🤖 Prompt for AI Agents |
||
|
|
||
| export ENABLE_RTT=true | ||
| export ENABLE_PKT_DROPS=true | ||
| export ENABLE_DNS_TRACKING=true | ||
| export ENABLE_IPSEC_TRACKING=true | ||
| export ENABLE_TLS_TRACKING=true | ||
|
|
||
| # Embedded FLP serves Prometheus on 9102; keep agent metrics on 9090 (default). | ||
| export METRICS_SERVER_PORT=9090 | ||
|
|
||
| echo "Prometheus UI: http://127.0.0.1:9091 (Graph: try netobserv:tcp_flow_rtt:milliseconds; Alerts: NetobservNoFlowMetrics)" | ||
| echo "Embedded FLP metrics: http://127.0.0.1:9102/metrics" | ||
| echo "Agent metrics: http://127.0.0.1:${METRICS_SERVER_PORT}/metrics" | ||
| echo "Starting agent (sudo)…" | ||
|
|
||
| exec sudo -E "${AGENT_BIN}" | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
count bycounts series, not bytes — name mismatch.count by (TLSVersion) (netobserv_tls_flow_bytes)yields the number of active series per TLS version, not a byte-related quantity. If the intent is byte volume per version, usesum by (TLSVersion)like the IPSec rule above; otherwise rename the record to something likenetobserv:tls_flows:count_by_versionto avoid confusion.🤖 Prompt for AI Agents