diff --git a/README.md b/README.md index f494d63d1..92015b632 100644 --- a/README.md +++ b/README.md @@ -77,6 +77,8 @@ sudo -E bin/netobserv-ebpf-agent For more information about configuring flowlogs-pipeline, please refer to [its documentation](https://github.com/netobserv/flowlogs-pipeline). +For **direct-flp** with embedded Prometheus metrics, optional RTT/DNS/packet-drop/IPsec/TLS features, and a local Prometheus (Docker) including recording rules and an example alert, see [examples/direct-flp/prometheus-local/README.md](./examples/direct-flp/prometheus-local/README.md). + To deploy locally, use instructions from [flowlogs-dump (like tcpdump)](./examples/flowlogs-dump/README.md). To deploy it as a Pod, you can check the [deployment examples](./deployments). diff --git a/docs/config.md b/docs/config.md index f4e17a477..a71112270 100644 --- a/docs/config.md +++ b/docs/config.md @@ -84,7 +84,7 @@ The following environment variables are available to configure the NetObserv eBP * `PCA_FILTER` (default: `none`). Works only when `ENABLE_PCA` is set. Accepted format . Example `PCA_FILTER=tcp,22`. * `PCA_SERVER_PORT` (default: 0). Works only when `ENABLE_PCA` is set. Agent opens PCA Server at this port. A collector can connect to it and recieve filtered packets as pcap stream. The filter is set using `PCA_FILTER`. -* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options. +* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options. For a documented end-to-end lab (direct-flp, RTT, DNS, packet drops, IPsec and TLS tracking, embedded Prometheus encode on port 9102, Docker Prometheus, recording rules, and example alert), see [examples/direct-flp/prometheus-local/README.md](../examples/direct-flp/prometheus-local/README.md). * `METRICS_ENABLED` (default: `false`). If `true`, the agent will export metrics to the configured `EXPORT` endpoint. * `METRICS_SERVER_ADDRESS` Address of the server where the metrics will be exported. * `METRICS_SERVER_PORT` (default: 9090). Port of the server where the metrics will be exported. diff --git a/examples/direct-flp/README.md b/examples/direct-flp/README.md index ca468f421..566a27ad2 100644 --- a/examples/direct-flp/README.md +++ b/examples/direct-flp/README.md @@ -14,6 +14,17 @@ export EXPORT="direct-flp" sudo -E bin/netobserv-ebpf-agent ``` +## Local direct-flp with RTT, DNS, packet drops, and Prometheus (Docker) + +The directory [prometheus-local](./prometheus-local/) runs the agent with `EXPORT=direct-flp`, enables RTT, DNS, packet-drop, IPsec, and TLS tracking in the agent, embeds flowlogs-pipeline with a `encode/prom` stage on **port 9102**, and starts a dedicated Prometheus (UI on **9091**) that scrapes those metrics from the host. Full documentation (FLP metric names, Prometheus recording rules, example alert, PromQL, ports, and files) is in [prometheus-local/README.md](./prometheus-local/README.md). + +```bash +make compile +./examples/direct-flp/prometheus-local/run-example.sh +``` + +Packet drop tracking needs tracepoint access to `/sys/kernel/debug` (typically mount it read-write and run with sufficient privileges). See [docs/config.md](../../docs/config.md) and the main [README.md](../../README.md) for capability and troubleshooting notes. + To start a collector, you can start another (standalone) instance of flowlogs-pipeline, configured with an IPFIX ingester and logging on stdout. Example for [FLP repo](https://github.com/netobserv/flowlogs-pipeline): diff --git a/examples/direct-flp/prometheus-local/README.md b/examples/direct-flp/prometheus-local/README.md new file mode 100644 index 000000000..ad1d23755 --- /dev/null +++ b/examples/direct-flp/prometheus-local/README.md @@ -0,0 +1,112 @@ +# direct-flp with Prometheus (local lab) + +This example runs the NetObserv eBPF agent on the host with **`EXPORT=direct-flp`**, embeds [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) using **`FLP_CONFIG`** from [`flp-prometheus.yaml`](./flp-prometheus.yaml), and starts a **Prometheus** container that scrapes the embedded pipeline’s `/metrics` endpoint. + +Optional features are turned on in the runner script for demonstration: **RTT** (`ENABLE_RTT`), **DNS tracking** (`ENABLE_DNS_TRACKING`), **packet drop tracking** (`ENABLE_PKT_DROPS`), **IPsec flow metadata** (`ENABLE_IPSEC_TRACKING`), and **TLS metadata per flow** (`ENABLE_TLS_TRACKING`). See [docs/config.md](../../../docs/config.md) and [pkg/config/config.go](../../../pkg/config/config.go) for all agent environment variables. + +IPsec and TLS hooks add eBPF work and only populate fields when matching traffic exists on the traced interfaces; they are enabled here so the corresponding `netobserv_*` series appear in Prometheus when you exercise those protocols. This is **TLS metadata from the agent’s eBPF TLS tracker** (`ENABLE_TLS_TRACKING`), not Kafka or gRPC TLS. Userspace OpenSSL correlation uses `ENABLE_OPENSSL_TRACKING` separately and is not turned on in this script. + +## Quick start + +From the repository root: + +```bash +make compile +./examples/direct-flp/prometheus-local/run-example.sh +``` + +- **Prometheus UI:** http://127.0.0.1:9091 +- **Embedded FLP metrics:** http://127.0.0.1:9102/metrics +- **Agent operational metrics:** http://127.0.0.1:9090/metrics (default `METRICS_SERVER_PORT`) + +Stop Prometheus: `docker compose -f examples/direct-flp/prometheus-local/docker-compose.yaml down` + +## Why two ports (9090 and 9102) + +The agent exposes its own Prometheus listener on **`METRICS_SERVER_PORT`** (default **9090**). The embedded flowlogs-pipeline also starts a global metrics HTTP server; if its port is left unset it defaults to **9090** as well and would conflict. [`flp-prometheus.yaml`](./flp-prometheus.yaml) sets **`metricsSettings.port: 9102`** so flow encode metrics are served separately. + +## Flow metrics exposed by FLP (`flp-prometheus.yaml`) + +All names use the encode prefix **`netobserv_`** (see `prom.prefix` in the file). + +| Prometheus metric | Type | When populated | Main labels | +|-------------------|------|----------------|-------------| +| `netobserv_flow_bytes` | gauge | `Bytes` present on the flow | `SrcAddr`, `DstAddr`, `SrcPort`, `DstPort`, `Proto` | +| `netobserv_flow_packets` | gauge | `Packets` present | same as above | +| `netobserv_dns_latency_ms` | gauge | DNS tracking enabled, `DnsLatencyMs` present | `SrcAddr`, `DstAddr`, `DstPort`, `DnsFlagsResponseCode` | +| `netobserv_tcp_flow_rtt_ns` | gauge | RTT enabled, `TimeFlowRttNs` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto` | +| `netobserv_pkt_drop_packets` | gauge | Packet drop tracking enabled, `PktDropPackets` present | `SrcAddr`, `DstAddr`, `PktDropLatestDropCause` | +| `netobserv_ipsec_flow_bytes` | gauge | IPsec tracking enabled; `IPSecStatus` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `IPSecStatus`, `IPSecRetCode` | +| `netobserv_tls_flow_bytes` | gauge | TLS tracking enabled; `TLSVersion` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `TLSVersion`, `TLSCipherSuite` | + +These are **gauges** reflecting the last value for each label set (with FLP `expiryTime: 5m` cleaning stale series). They are not counters; use aggregations such as `max_over_time` or `topk` rather than `rate()` unless you understand the semantics. + +**Cardinality:** labels include IP addresses and ports. This is appropriate for a **local lab** only, not for a large production Prometheus. + +**Packet drops:** the agent needs access to kernel tracepoints (typically **`/sys/kernel/debug`** mounted and sufficient privileges). See the main [README.md](../../../README.md) and [docs/config.md](../../../docs/config.md). + +## Prometheus rules (`netobserv-rules.yml`) + +[`prometheus.yml`](./prometheus.yml) loads [`netobserv-rules.yml`](./netobserv-rules.yml) via `rule_files`. + +### Recording rules (precomputed series) + +| Recorded metric | Expression (summary) | +|-----------------|----------------------| +| `netobserv:pkt_drop_packets:sum_by_cause` | `sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)` | +| `netobserv:tcp_flow_rtt:milliseconds` | `netobserv_tcp_flow_rtt_ns / 1e6` | +| `netobserv:dns_latency_ms:max_over_5m` | `max_over_time(netobserv_dns_latency_ms[5m])` | +| `netobserv:flow_bytes:top10` | `topk(10, netobserv_flow_bytes)` | +| `netobserv:flow_bytes:deriv_2m` | `deriv(netobserv_flow_bytes[2m])` (noisy; exploratory) | +| `netobserv:ipsec_flow_bytes:sum_by_status` | `sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)` | +| `netobserv:tls_flow_bytes:count_by_version` | `count by (TLSVersion) (netobserv_tls_flow_bytes)` | + +In the Prometheus UI (**Graph**), you can query these names directly instead of typing the full expression. + +### Example alert + +| Alert | Condition | Meaning | +|-------|-----------|---------| +| `NetobservNoFlowMetrics` | `absent(netobserv_flow_bytes)` for 3m | No `netobserv_flow_bytes` series (scrape failure, agent stopped, or no flows exported yet). Check the agent, http://127.0.0.1:9102/metrics on the host, and Docker `host.docker.internal` reachability. | + +View rule evaluation status under **Status → Rules** and firing alerts under **Alerts**. + +## Ad hoc PromQL (raw FLP metrics) + +```promql +{__name__=~"netobserv_.*"} +``` + +```promql +topk(10, netobserv_flow_bytes) +``` + +```promql +netobserv_tcp_flow_rtt_ns / 1e6 +``` + +```promql +sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets) +``` + +```promql +netobserv_ipsec_flow_bytes +``` + +```promql +topk(5, netobserv_tls_flow_bytes) +``` + +## Files in this directory + +| File | Role | +|------|------| +| [`flp-prometheus.yaml`](./flp-prometheus.yaml) | `FLP_CONFIG`: `metricsSettings` + `encode/prom` pipeline after `preset-ingester` | +| [`prometheus.yml`](./prometheus.yml) | Prometheus scrape config + `rule_files` | +| [`netobserv-rules.yml`](./netobserv-rules.yml) | Recording rules and example alert | +| [`docker-compose.yaml`](./docker-compose.yaml) | Prometheus service and config mounts | +| [`run-example.sh`](./run-example.sh) | Starts Compose, exports env vars, runs the agent with `sudo -E` | + +## Prometheus without Docker + +Point a local `prometheus.yml` at `127.0.0.1:9102` instead of `host.docker.internal:9102`, and use the same `rule_files` entry with a path to [`netobserv-rules.yml`](./netobserv-rules.yml) on your machine. diff --git a/examples/direct-flp/prometheus-local/docker-compose.yaml b/examples/direct-flp/prometheus-local/docker-compose.yaml new file mode 100644 index 000000000..99fd0299a --- /dev/null +++ b/examples/direct-flp/prometheus-local/docker-compose.yaml @@ -0,0 +1,15 @@ +services: + prometheus: + image: docker.io/prom/prometheus:v2.53.3 + ports: + # Host UI/API; agent + embedded FLP keep 9090 and 9102 on the host. + - "9091:9090" + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro + - ./netobserv-rules.yml:/etc/prometheus/netobserv-rules.yml:ro + command: + - --config.file=/etc/prometheus/prometheus.yml + - --storage.tsdb.retention.time=2h + - --web.enable-lifecycle + extra_hosts: + - "host.docker.internal:host-gateway" diff --git a/examples/direct-flp/prometheus-local/flp-prometheus.yaml b/examples/direct-flp/prometheus-local/flp-prometheus.yaml new file mode 100644 index 000000000..bca341c7d --- /dev/null +++ b/examples/direct-flp/prometheus-local/flp-prometheus.yaml @@ -0,0 +1,111 @@ +# Flowlogs-pipeline config for EXPORT=direct-flp (ingest is provided by the agent). +# Exposes Prometheus metrics on :9102/metrics (see metricsSettings). +# Agent operational metrics stay on METRICS_SERVER_PORT (default 9090). +log-level: info +metricsSettings: + address: "0.0.0.0" + port: 9102 +pipeline: + - name: encode_prom + follows: preset-ingester +parameters: + - name: encode_prom + encode: + type: prom + prom: + prefix: netobserv_ + expiryTime: 5m + metrics: + - name: flow_bytes + type: gauge + help: Last observed byte count for a flow (high-cardinality labels; local lab only). + filters: + - key: Bytes + type: presence + valueKey: Bytes + labels: + - SrcAddr + - DstAddr + - SrcPort + - DstPort + - Proto + - name: flow_packets + type: gauge + help: Last observed packet count for a flow. + filters: + - key: Packets + type: presence + valueKey: Packets + labels: + - SrcAddr + - DstAddr + - SrcPort + - DstPort + - Proto + - name: dns_latency_ms + type: gauge + help: DNS request/response latency in milliseconds when DNS tracking is enabled. + filters: + - key: DnsLatencyMs + type: presence + valueKey: DnsLatencyMs + labels: + - SrcAddr + - DstAddr + - DstPort + - DnsFlagsResponseCode + - name: tcp_flow_rtt_ns + type: gauge + help: TCP smoothed RTT in nanoseconds when RTT tracking is enabled. + filters: + - key: TimeFlowRttNs + type: presence + valueKey: TimeFlowRttNs + labels: + - SrcAddr + - DstAddr + - DstPort + - Proto + - name: pkt_drop_packets + type: gauge + help: Kernel packet drops attributed to the flow when packet drop tracking is enabled. + filters: + - key: PktDropPackets + type: presence + valueKey: PktDropPackets + labels: + - SrcAddr + - DstAddr + - PktDropLatestDropCause + - name: ipsec_flow_bytes + type: gauge + help: Last observed bytes for flows where IPsec encryption was detected (ENABLE_IPSEC_TRACKING). + filters: + - key: Bytes + type: presence + - key: IPSecStatus + type: presence + valueKey: Bytes + labels: + - SrcAddr + - DstAddr + - DstPort + - Proto + - IPSecStatus + - IPSecRetCode + - name: tls_flow_bytes + type: gauge + help: Last observed bytes for flows where TLS metadata was captured (ENABLE_TLS_TRACKING). + filters: + - key: Bytes + type: presence + - key: TLSVersion + type: presence + valueKey: Bytes + labels: + - SrcAddr + - DstAddr + - DstPort + - Proto + - TLSVersion + - TLSCipherSuite diff --git a/examples/direct-flp/prometheus-local/netobserv-rules.yml b/examples/direct-flp/prometheus-local/netobserv-rules.yml new file mode 100644 index 000000000..2e5d25027 --- /dev/null +++ b/examples/direct-flp/prometheus-local/netobserv-rules.yml @@ -0,0 +1,44 @@ +# Recording + alert rules for the netobserv direct-flp / prom-local example. +# After reload, use Graph → metric name (e.g. netobserv:tcp_flow_rtt:milliseconds). + +groups: + - name: netobserv-recording + interval: 15s + rules: + # sum of drop packets attributed to flows, grouped by last kernel drop cause + - record: netobserv:pkt_drop_packets:sum_by_cause + expr: sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets) + + # TCP SRTT in milliseconds (source series is nanoseconds) + - record: netobserv:tcp_flow_rtt:milliseconds + expr: netobserv_tcp_flow_rtt_ns / 1e6 + + # per-series max DNS latency over 5m (requires ≥10s scrape; matches global interval) + - record: netobserv:dns_latency_ms:max_over_5m + expr: max_over_time(netobserv_dns_latency_ms[5m]) + + # up to 10 flows with largest last-reported byte gauge (high-cardinality labels preserved) + - record: netobserv:flow_bytes:top10 + expr: topk(10, netobserv_flow_bytes) + + # optional: rough “change” signal for byte gauges (can be noisy) + - record: netobserv:flow_bytes:deriv_2m + expr: deriv(netobserv_flow_bytes[2m]) + + - record: netobserv:ipsec_flow_bytes:sum_by_status + expr: sum by (IPSecStatus) (netobserv_ipsec_flow_bytes) + + - record: netobserv:tls_flow_bytes:count_by_version + expr: count by (TLSVersion) (netobserv_tls_flow_bytes) + + - name: netobserv-alerts + rules: + # Fires when no netobserv flow byte series exist (scrape down, agent stopped, or no traffic yet). + - alert: NetobservNoFlowMetrics + expr: absent(netobserv_flow_bytes) + for: 3m + labels: + severity: warning + annotations: + summary: No netobserv_flow_bytes series scraped + description: Prometheus has not recorded netobserv_flow_bytes in 3m. Check agent, :9102/metrics, and host.docker.internal from the container. diff --git a/examples/direct-flp/prometheus-local/prometheus.yml b/examples/direct-flp/prometheus-local/prometheus.yml new file mode 100644 index 000000000..92d5a61f9 --- /dev/null +++ b/examples/direct-flp/prometheus-local/prometheus.yml @@ -0,0 +1,13 @@ +global: + scrape_interval: 10s + evaluation_interval: 10s + +rule_files: + - /etc/prometheus/netobserv-rules.yml + +scrape_configs: + - job_name: netobserv-flowlogs-pipeline + metrics_path: /metrics + static_configs: + # From inside the Prometheus container, reach the host where the agent listens (9102). + - targets: ["host.docker.internal:9102"] diff --git a/examples/direct-flp/prometheus-local/run-example.sh b/examples/direct-flp/prometheus-local/run-example.sh new file mode 100755 index 000000000..7828522ec --- /dev/null +++ b/examples/direct-flp/prometheus-local/run-example.sh @@ -0,0 +1,43 @@ +#!/usr/bin/env bash +# Run NetObserv eBPF agent on the host with direct-flp + Prometheus encode, alongside +# a Prometheus instance in Docker that scrapes the embedded flowlogs-pipeline /metrics. +# +# Prerequisites: +# - Built agent: (cd repo root && make compile) +# - Docker with compose plugin +# - Caps: sudo, or CAP_BPF + CAP_PERFMON + CAP_NET_ADMIN (see README) +# - Packet drops: /sys/kernel/debug mounted read-write (often requires privileged / root) +# - IPsec/TLS: ENABLE_IPSEC_TRACKING / ENABLE_TLS_TRACKING (extra eBPF hooks; series only if traffic matches) +# +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)" +AGENT_BIN="${REPO_ROOT}/bin/netobserv-ebpf-agent" + +if [[ ! -x "${AGENT_BIN}" ]]; then + echo "Missing ${AGENT_BIN}. Run: cd ${REPO_ROOT} && make compile" >&2 + exit 1 +fi + +cd "${SCRIPT_DIR}" +docker compose up -d + +export EXPORT=direct-flp +export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")" + +export ENABLE_RTT=true +export ENABLE_PKT_DROPS=true +export ENABLE_DNS_TRACKING=true +export ENABLE_IPSEC_TRACKING=true +export ENABLE_TLS_TRACKING=true + +# Embedded FLP serves Prometheus on 9102; keep agent metrics on 9090 (default). +export METRICS_SERVER_PORT=9090 + +echo "Prometheus UI: http://127.0.0.1:9091 (Graph: try netobserv:tcp_flow_rtt:milliseconds; Alerts: NetobservNoFlowMetrics)" +echo "Embedded FLP metrics: http://127.0.0.1:9102/metrics" +echo "Agent metrics: http://127.0.0.1:${METRICS_SERVER_PORT}/metrics" +echo "Starting agent (sudo)…" + +exec sudo -E "${AGENT_BIN}"