netobserv · jpinsonneau · Apr 17, 2026 · coderabbitai · Apr 17, 2026 · coderabbitai
diff --git a/README.md b/README.md
@@ -77,6 +77,8 @@ sudo -E bin/netobserv-ebpf-agent
 
 For more information about configuring flowlogs-pipeline, please refer to [its documentation](https://github.com/netobserv/flowlogs-pipeline).
 
+For **direct-flp** with embedded Prometheus metrics, optional RTT/DNS/packet-drop/IPsec/TLS features, and a local Prometheus (Docker) including recording rules and an example alert, see [examples/direct-flp/prometheus-local/README.md](./examples/direct-flp/prometheus-local/README.md).
+
 To deploy locally, use instructions from [flowlogs-dump (like tcpdump)](./examples/flowlogs-dump/README.md).    
 To deploy it as a Pod, you can check the [deployment examples](./deployments).
 

diff --git a/docs/config.md b/docs/config.md
@@ -84,7 +84,7 @@ The following environment variables are available to configure the NetObserv eBP
 * `PCA_FILTER` (default: `none`). Works only when `ENABLE_PCA` is set. Accepted format <protocol,portnumber>. Example 
   `PCA_FILTER=tcp,22`.
 * `PCA_SERVER_PORT` (default: 0). Works only when `ENABLE_PCA` is set. Agent opens PCA Server at this port. A collector can connect to it and recieve filtered packets as pcap stream. The filter is set using `PCA_FILTER`.
-* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options.
+* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options. For a documented end-to-end lab (direct-flp, RTT, DNS, packet drops, IPsec and TLS tracking, embedded Prometheus encode on port 9102, Docker Prometheus, recording rules, and example alert), see [examples/direct-flp/prometheus-local/README.md](../examples/direct-flp/prometheus-local/README.md).
 * `METRICS_ENABLED` (default: `false`). If `true`, the agent will export metrics to the configured `EXPORT` endpoint.
   * `METRICS_SERVER_ADDRESS` Address of the server where the metrics will be exported.
   * `METRICS_SERVER_PORT` (default: 9090). Port of the server where the metrics will be exported.

diff --git a/examples/direct-flp/README.md b/examples/direct-flp/README.md
@@ -14,6 +14,17 @@ export EXPORT="direct-flp"
 sudo -E bin/netobserv-ebpf-agent
 ```
 
+## Local direct-flp with RTT, DNS, packet drops, and Prometheus (Docker)
+
+The directory [prometheus-local](./prometheus-local/) runs the agent with `EXPORT=direct-flp`, enables RTT, DNS, packet-drop, IPsec, and TLS tracking in the agent, embeds flowlogs-pipeline with a `encode/prom` stage on **port 9102**, and starts a dedicated Prometheus (UI on **9091**) that scrapes those metrics from the host. Full documentation (FLP metric names, Prometheus recording rules, example alert, PromQL, ports, and files) is in [prometheus-local/README.md](./prometheus-local/README.md).
+
+```bash
+make compile
+./examples/direct-flp/prometheus-local/run-example.sh
+```
+
+Packet drop tracking needs tracepoint access to `/sys/kernel/debug` (typically mount it read-write and run with sufficient privileges). See [docs/config.md](../../docs/config.md) and the main [README.md](../../README.md) for capability and troubleshooting notes.
+
 To start a collector, you can start another (standalone) instance of flowlogs-pipeline, configured with an IPFIX ingester and logging on stdout.
 
 Example for [FLP repo](https://github.com/netobserv/flowlogs-pipeline):

diff --git a/examples/direct-flp/prometheus-local/README.md b/examples/direct-flp/prometheus-local/README.md
@@ -0,0 +1,112 @@
+# direct-flp with Prometheus (local lab)
+
+This example runs the NetObserv eBPF agent on the host with **`EXPORT=direct-flp`**, embeds [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) using **`FLP_CONFIG`** from [`flp-prometheus.yaml`](./flp-prometheus.yaml), and starts a **Prometheus** container that scrapes the embedded pipeline’s `/metrics` endpoint.
+
+Optional features are turned on in the runner script for demonstration: **RTT** (`ENABLE_RTT`), **DNS tracking** (`ENABLE_DNS_TRACKING`), **packet drop tracking** (`ENABLE_PKT_DROPS`), **IPsec flow metadata** (`ENABLE_IPSEC_TRACKING`), and **TLS metadata per flow** (`ENABLE_TLS_TRACKING`). See [docs/config.md](../../../docs/config.md) and [pkg/config/config.go](../../../pkg/config/config.go) for all agent environment variables.
+
+IPsec and TLS hooks add eBPF work and only populate fields when matching traffic exists on the traced interfaces; they are enabled here so the corresponding `netobserv_*` series appear in Prometheus when you exercise those protocols. This is **TLS metadata from the agent’s eBPF TLS tracker** (`ENABLE_TLS_TRACKING`), not Kafka or gRPC TLS. Userspace OpenSSL correlation uses `ENABLE_OPENSSL_TRACKING` separately and is not turned on in this script.
+
+## Quick start
+
+From the repository root:
+
+```bash
+make compile
+./examples/direct-flp/prometheus-local/run-example.sh
+```
+
+- **Prometheus UI:** http://127.0.0.1:9091  
+- **Embedded FLP metrics:** http://127.0.0.1:9102/metrics  
+- **Agent operational metrics:** http://127.0.0.1:9090/metrics (default `METRICS_SERVER_PORT`)
+
+Stop Prometheus: `docker compose -f examples/direct-flp/prometheus-local/docker-compose.yaml down`
+
+## Why two ports (9090 and 9102)
+
+The agent exposes its own Prometheus listener on **`METRICS_SERVER_PORT`** (default **9090**). The embedded flowlogs-pipeline also starts a global metrics HTTP server; if its port is left unset it defaults to **9090** as well and would conflict. [`flp-prometheus.yaml`](./flp-prometheus.yaml) sets **`metricsSettings.port: 9102`** so flow encode metrics are served separately.
+
+## Flow metrics exposed by FLP (`flp-prometheus.yaml`)
+
+All names use the encode prefix **`netobserv_`** (see `prom.prefix` in the file).
+
+| Prometheus metric | Type | When populated | Main labels |
+|-------------------|------|----------------|-------------|
+| `netobserv_flow_bytes` | gauge | `Bytes` present on the flow | `SrcAddr`, `DstAddr`, `SrcPort`, `DstPort`, `Proto` |
+| `netobserv_flow_packets` | gauge | `Packets` present | same as above |
+| `netobserv_dns_latency_ms` | gauge | DNS tracking enabled, `DnsLatencyMs` present | `SrcAddr`, `DstAddr`, `DstPort`, `DnsFlagsResponseCode` |
+| `netobserv_tcp_flow_rtt_ns` | gauge | RTT enabled, `TimeFlowRttNs` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto` |
+| `netobserv_pkt_drop_packets` | gauge | Packet drop tracking enabled, `PktDropPackets` present | `SrcAddr`, `DstAddr`, `PktDropLatestDropCause` |
+| `netobserv_ipsec_flow_bytes` | gauge | IPsec tracking enabled; `IPSecStatus` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `IPSecStatus`, `IPSecRetCode` |
+| `netobserv_tls_flow_bytes` | gauge | TLS tracking enabled; `TLSVersion` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `TLSVersion`, `TLSCipherSuite` |
+
+These are **gauges** reflecting the last value for each label set (with FLP `expiryTime: 5m` cleaning stale series). They are not counters; use aggregations such as `max_over_time` or `topk` rather than `rate()` unless you understand the semantics.
+
+**Cardinality:** labels include IP addresses and ports. This is appropriate for a **local lab** only, not for a large production Prometheus.
+
+**Packet drops:** the agent needs access to kernel tracepoints (typically **`/sys/kernel/debug`** mounted and sufficient privileges). See the main [README.md](../../../README.md) and [docs/config.md](../../../docs/config.md).
+
+## Prometheus rules (`netobserv-rules.yml`)
+
+[`prometheus.yml`](./prometheus.yml) loads [`netobserv-rules.yml`](./netobserv-rules.yml) via `rule_files`.
+
+### Recording rules (precomputed series)
+
+| Recorded metric | Expression (summary) |
+|-----------------|----------------------|
+| `netobserv:pkt_drop_packets:sum_by_cause` | `sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)` |
+| `netobserv:tcp_flow_rtt:milliseconds` | `netobserv_tcp_flow_rtt_ns / 1e6` |
+| `netobserv:dns_latency_ms:max_over_5m` | `max_over_time(netobserv_dns_latency_ms[5m])` |
+| `netobserv:flow_bytes:top10` | `topk(10, netobserv_flow_bytes)` |
+| `netobserv:flow_bytes:deriv_2m` | `deriv(netobserv_flow_bytes[2m])` (noisy; exploratory) |
+| `netobserv:ipsec_flow_bytes:sum_by_status` | `sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)` |
+| `netobserv:tls_flow_bytes:count_by_version` | `count by (TLSVersion) (netobserv_tls_flow_bytes)` |
+
+In the Prometheus UI (**Graph**), you can query these names directly instead of typing the full expression.
+
+### Example alert
+
+| Alert | Condition | Meaning |
+|-------|-----------|---------|
+| `NetobservNoFlowMetrics` | `absent(netobserv_flow_bytes)` for 3m | No `netobserv_flow_bytes` series (scrape failure, agent stopped, or no flows exported yet). Check the agent, http://127.0.0.1:9102/metrics on the host, and Docker `host.docker.internal` reachability. |
+
+View rule evaluation status under **Status → Rules** and firing alerts under **Alerts**.
+
+## Ad hoc PromQL (raw FLP metrics)
+
+```promql
+{__name__=~"netobserv_.*"}
+```
+
+```promql
+topk(10, netobserv_flow_bytes)
+```
+
+```promql
+netobserv_tcp_flow_rtt_ns / 1e6
+```
+
+```promql
+sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)
+```
+
+```promql
+netobserv_ipsec_flow_bytes
+```
+
+```promql
+topk(5, netobserv_tls_flow_bytes)
+```
+
+## Files in this directory
+
+| File | Role |
+|------|------|
+| [`flp-prometheus.yaml`](./flp-prometheus.yaml) | `FLP_CONFIG`: `metricsSettings` + `encode/prom` pipeline after `preset-ingester` |
+| [`prometheus.yml`](./prometheus.yml) | Prometheus scrape config + `rule_files` |
+| [`netobserv-rules.yml`](./netobserv-rules.yml) | Recording rules and example alert |
+| [`docker-compose.yaml`](./docker-compose.yaml) | Prometheus service and config mounts |
+| [`run-example.sh`](./run-example.sh) | Starts Compose, exports env vars, runs the agent with `sudo -E` |
+
+## Prometheus without Docker
+
+Point a local `prometheus.yml` at `127.0.0.1:9102` instead of `host.docker.internal:9102`, and use the same `rule_files` entry with a path to [`netobserv-rules.yml`](./netobserv-rules.yml) on your machine.
diff --git a/examples/direct-flp/prometheus-local/docker-compose.yaml b/examples/direct-flp/prometheus-local/docker-compose.yaml
@@ -0,0 +1,15 @@
+services:
+  prometheus:
+    image: docker.io/prom/prometheus:v2.53.3
+    ports:
+      # Host UI/API; agent + embedded FLP keep 9090 and 9102 on the host.
+      - "9091:9090"
+    volumes:
+      - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
+      - ./netobserv-rules.yml:/etc/prometheus/netobserv-rules.yml:ro
+    command:
+      - --config.file=/etc/prometheus/prometheus.yml
+      - --storage.tsdb.retention.time=2h
+      - --web.enable-lifecycle
+    extra_hosts:
+      - "host.docker.internal:host-gateway"
diff --git a/examples/direct-flp/prometheus-local/flp-prometheus.yaml b/examples/direct-flp/prometheus-local/flp-prometheus.yaml
@@ -0,0 +1,111 @@
+# Flowlogs-pipeline config for EXPORT=direct-flp (ingest is provided by the agent).
+# Exposes Prometheus metrics on :9102/metrics (see metricsSettings).
+# Agent operational metrics stay on METRICS_SERVER_PORT (default 9090).
+log-level: info
+metricsSettings:
+  address: "0.0.0.0"
+  port: 9102
+pipeline:
+  - name: encode_prom
+    follows: preset-ingester
+parameters:
+  - name: encode_prom
+    encode:
+      type: prom
+      prom:
+        prefix: netobserv_
+        expiryTime: 5m
+        metrics:
+          - name: flow_bytes
+            type: gauge
+            help: Last observed byte count for a flow (high-cardinality labels; local lab only).
+            filters:
+              - key: Bytes
+                type: presence
+            valueKey: Bytes
+            labels:
+              - SrcAddr
+              - DstAddr
+              - SrcPort
+              - DstPort
+              - Proto
+          - name: flow_packets
+            type: gauge
+            help: Last observed packet count for a flow.
+            filters:
+              - key: Packets
+                type: presence
+            valueKey: Packets
+            labels:
+              - SrcAddr
+              - DstAddr
+              - SrcPort
+              - DstPort
+              - Proto
+          - name: dns_latency_ms
+            type: gauge
+            help: DNS request/response latency in milliseconds when DNS tracking is enabled.
+            filters:
+              - key: DnsLatencyMs
+                type: presence
+            valueKey: DnsLatencyMs
+            labels:
+              - SrcAddr
+              - DstAddr
+              - DstPort
+              - DnsFlagsResponseCode
+          - name: tcp_flow_rtt_ns
+            type: gauge
+            help: TCP smoothed RTT in nanoseconds when RTT tracking is enabled.
+            filters:
+              - key: TimeFlowRttNs
+                type: presence
+            valueKey: TimeFlowRttNs
+            labels:
+              - SrcAddr
+              - DstAddr
+              - DstPort
+              - Proto
+          - name: pkt_drop_packets
+            type: gauge
+            help: Kernel packet drops attributed to the flow when packet drop tracking is enabled.
+            filters:
+              - key: PktDropPackets
+                type: presence
+            valueKey: PktDropPackets
+            labels:
+              - SrcAddr
+              - DstAddr
+              - PktDropLatestDropCause
+          - name: ipsec_flow_bytes
+            type: gauge
+            help: Last observed bytes for flows where IPsec encryption was detected (ENABLE_IPSEC_TRACKING).
+            filters:
+              - key: Bytes
+                type: presence
+              - key: IPSecStatus
+                type: presence
+            valueKey: Bytes
+            labels:
+              - SrcAddr
+              - DstAddr
+              - DstPort
+              - Proto
+              - IPSecStatus
+              - IPSecRetCode
+          - name: tls_flow_bytes
+            type: gauge
+            help: Last observed bytes for flows where TLS metadata was captured (ENABLE_TLS_TRACKING).
+            filters:
+              - key: Bytes
+                type: presence
+              - key: TLSVersion
+                type: presence
+            valueKey: Bytes
+            labels:
+              - SrcAddr
+              - DstAddr
+              - DstPort
+              - Proto
+              - TLSVersion
+              - TLSCipherSuite
diff --git a/examples/direct-flp/prometheus-local/netobserv-rules.yml b/examples/direct-flp/prometheus-local/netobserv-rules.yml
@@ -0,0 +1,44 @@
+# Recording + alert rules for the netobserv direct-flp / prom-local example.
+# After reload, use Graph → metric name (e.g. netobserv:tcp_flow_rtt:milliseconds).
+
+groups:
+  - name: netobserv-recording
+    interval: 15s
+    rules:
+      # sum of drop packets attributed to flows, grouped by last kernel drop cause
+      - record: netobserv:pkt_drop_packets:sum_by_cause
+        expr: sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)
+
+      # TCP SRTT in milliseconds (source series is nanoseconds)
+      - record: netobserv:tcp_flow_rtt:milliseconds
+        expr: netobserv_tcp_flow_rtt_ns / 1e6
+
+      # per-series max DNS latency over 5m (requires ≥10s scrape; matches global interval)
+      - record: netobserv:dns_latency_ms:max_over_5m
+        expr: max_over_time(netobserv_dns_latency_ms[5m])
+
+      # up to 10 flows with largest last-reported byte gauge (high-cardinality labels preserved)
+      - record: netobserv:flow_bytes:top10
+        expr: topk(10, netobserv_flow_bytes)
+
+      # optional: rough “change” signal for byte gauges (can be noisy)
+      - record: netobserv:flow_bytes:deriv_2m
+        expr: deriv(netobserv_flow_bytes[2m])
+
+      - record: netobserv:ipsec_flow_bytes:sum_by_status
+        expr: sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)
+
+      - record: netobserv:tls_flow_bytes:count_by_version
+        expr: count by (TLSVersion) (netobserv_tls_flow_bytes)
+
+  - name: netobserv-alerts
+    rules:
+      # Fires when no netobserv flow byte series exist (scrape down, agent stopped, or no traffic yet).
+      - alert: NetobservNoFlowMetrics
+        expr: absent(netobserv_flow_bytes)
+        for: 3m
+        labels:
+          severity: warning
+        annotations:
+          summary: No netobserv_flow_bytes series scraped
+          description: Prometheus has not recorded netobserv_flow_bytes in 3m. Check agent, :9102/metrics, and host.docker.internal from the container.
diff --git a/examples/direct-flp/prometheus-local/prometheus.yml b/examples/direct-flp/prometheus-local/prometheus.yml
@@ -0,0 +1,13 @@
+global:
+  scrape_interval: 10s
+  evaluation_interval: 10s
+
+rule_files:
+  - /etc/prometheus/netobserv-rules.yml
+
+scrape_configs:
+  - job_name: netobserv-flowlogs-pipeline
+    metrics_path: /metrics
+    static_configs:
+      # From inside the Prometheus container, reach the host where the agent listens (9102).
+      - targets: ["host.docker.internal:9102"]
diff --git a/examples/direct-flp/prometheus-local/run-example.sh b/examples/direct-flp/prometheus-local/run-example.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+# Run NetObserv eBPF agent on the host with direct-flp + Prometheus encode, alongside
+# a Prometheus instance in Docker that scrapes the embedded flowlogs-pipeline /metrics.
+#
+# Prerequisites:
+#   - Built agent: (cd repo root && make compile)
+#   - Docker with compose plugin
+#   - Caps: sudo, or CAP_BPF + CAP_PERFMON + CAP_NET_ADMIN (see README)
+#   - Packet drops: /sys/kernel/debug mounted read-write (often requires privileged / root)
+#   - IPsec/TLS: ENABLE_IPSEC_TRACKING / ENABLE_TLS_TRACKING (extra eBPF hooks; series only if traffic matches)
+#
+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
+AGENT_BIN="${REPO_ROOT}/bin/netobserv-ebpf-agent"
+
+if [[ ! -x "${AGENT_BIN}" ]]; then
+  echo "Missing ${AGENT_BIN}. Run: cd ${REPO_ROOT} && make compile" >&2
+  exit 1
+fi
+
+cd "${SCRIPT_DIR}"
+docker compose up -d
+
+export EXPORT=direct-flp
+export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+
+export ENABLE_RTT=true
+export ENABLE_PKT_DROPS=true
+export ENABLE_DNS_TRACKING=true
+export ENABLE_IPSEC_TRACKING=true
+export ENABLE_TLS_TRACKING=true
+
+# Embedded FLP serves Prometheus on 9102; keep agent metrics on 9090 (default).
+export METRICS_SERVER_PORT=9090
+
+echo "Prometheus UI: http://127.0.0.1:9091 (Graph: try netobserv:tcp_flow_rtt:milliseconds; Alerts: NetobservNoFlowMetrics)"
+echo "Embedded FLP metrics: http://127.0.0.1:9102/metrics"
+echo "Agent metrics: http://127.0.0.1:${METRICS_SERVER_PORT}/metrics"
+echo "Starting agent (sudo)…"
+
+exec sudo -E "${AGENT_BIN}"