Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,8 @@ sudo -E bin/netobserv-ebpf-agent

For more information about configuring flowlogs-pipeline, please refer to [its documentation](https://github.com/netobserv/flowlogs-pipeline).

For **direct-flp** with embedded Prometheus metrics, optional RTT/DNS/packet-drop/IPsec/TLS features, and a local Prometheus (Docker) including recording rules and an example alert, see [examples/direct-flp/prometheus-local/README.md](./examples/direct-flp/prometheus-local/README.md).

To deploy locally, use instructions from [flowlogs-dump (like tcpdump)](./examples/flowlogs-dump/README.md).
To deploy it as a Pod, you can check the [deployment examples](./deployments).

Expand Down
2 changes: 1 addition & 1 deletion docs/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ The following environment variables are available to configure the NetObserv eBP
* `PCA_FILTER` (default: `none`). Works only when `ENABLE_PCA` is set. Accepted format <protocol,portnumber>. Example
`PCA_FILTER=tcp,22`.
* `PCA_SERVER_PORT` (default: 0). Works only when `ENABLE_PCA` is set. Agent opens PCA Server at this port. A collector can connect to it and recieve filtered packets as pcap stream. The filter is set using `PCA_FILTER`.
* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options.
* `FLP_CONFIG`: [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) configuration as YAML or JSON, used when `EXPORT` is `direct-flp`. The ingest stage must be omitted from this configuration, since it is handled internally by the agent. The first stage should follow "preset-ingester". E.g, for a minimal configuration printing on terminal: `{"pipeline":[{"name": "writer","follows": "preset-ingester"}],"parameters":[{"name": "writer","write": {"type": "stdout"}}]}`. Refer to flowlogs-pipeline documentation for more options. For a documented end-to-end lab (direct-flp, RTT, DNS, packet drops, IPsec and TLS tracking, embedded Prometheus encode on port 9102, Docker Prometheus, recording rules, and example alert), see [examples/direct-flp/prometheus-local/README.md](../examples/direct-flp/prometheus-local/README.md).
* `METRICS_ENABLED` (default: `false`). If `true`, the agent will export metrics to the configured `EXPORT` endpoint.
* `METRICS_SERVER_ADDRESS` Address of the server where the metrics will be exported.
* `METRICS_SERVER_PORT` (default: 9090). Port of the server where the metrics will be exported.
Expand Down
11 changes: 11 additions & 0 deletions examples/direct-flp/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,17 @@ export EXPORT="direct-flp"
sudo -E bin/netobserv-ebpf-agent
```

## Local direct-flp with RTT, DNS, packet drops, and Prometheus (Docker)

The directory [prometheus-local](./prometheus-local/) runs the agent with `EXPORT=direct-flp`, enables RTT, DNS, packet-drop, IPsec, and TLS tracking in the agent, embeds flowlogs-pipeline with a `encode/prom` stage on **port 9102**, and starts a dedicated Prometheus (UI on **9091**) that scrapes those metrics from the host. Full documentation (FLP metric names, Prometheus recording rules, example alert, PromQL, ports, and files) is in [prometheus-local/README.md](./prometheus-local/README.md).

```bash
make compile
./examples/direct-flp/prometheus-local/run-example.sh
```

Packet drop tracking needs tracepoint access to `/sys/kernel/debug` (typically mount it read-write and run with sufficient privileges). See [docs/config.md](../../docs/config.md) and the main [README.md](../../README.md) for capability and troubleshooting notes.

To start a collector, you can start another (standalone) instance of flowlogs-pipeline, configured with an IPFIX ingester and logging on stdout.

Example for [FLP repo](https://github.com/netobserv/flowlogs-pipeline):
Expand Down
112 changes: 112 additions & 0 deletions examples/direct-flp/prometheus-local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# direct-flp with Prometheus (local lab)

This example runs the NetObserv eBPF agent on the host with **`EXPORT=direct-flp`**, embeds [flowlogs-pipeline](https://github.com/netobserv/flowlogs-pipeline) using **`FLP_CONFIG`** from [`flp-prometheus.yaml`](./flp-prometheus.yaml), and starts a **Prometheus** container that scrapes the embedded pipeline’s `/metrics` endpoint.

Optional features are turned on in the runner script for demonstration: **RTT** (`ENABLE_RTT`), **DNS tracking** (`ENABLE_DNS_TRACKING`), **packet drop tracking** (`ENABLE_PKT_DROPS`), **IPsec flow metadata** (`ENABLE_IPSEC_TRACKING`), and **TLS metadata per flow** (`ENABLE_TLS_TRACKING`). See [docs/config.md](../../../docs/config.md) and [pkg/config/config.go](../../../pkg/config/config.go) for all agent environment variables.

IPsec and TLS hooks add eBPF work and only populate fields when matching traffic exists on the traced interfaces; they are enabled here so the corresponding `netobserv_*` series appear in Prometheus when you exercise those protocols. This is **TLS metadata from the agent’s eBPF TLS tracker** (`ENABLE_TLS_TRACKING`), not Kafka or gRPC TLS. Userspace OpenSSL correlation uses `ENABLE_OPENSSL_TRACKING` separately and is not turned on in this script.

## Quick start

From the repository root:

```bash
make compile
./examples/direct-flp/prometheus-local/run-example.sh
```

- **Prometheus UI:** http://127.0.0.1:9091
- **Embedded FLP metrics:** http://127.0.0.1:9102/metrics
- **Agent operational metrics:** http://127.0.0.1:9090/metrics (default `METRICS_SERVER_PORT`)

Stop Prometheus: `docker compose -f examples/direct-flp/prometheus-local/docker-compose.yaml down`

## Why two ports (9090 and 9102)

The agent exposes its own Prometheus listener on **`METRICS_SERVER_PORT`** (default **9090**). The embedded flowlogs-pipeline also starts a global metrics HTTP server; if its port is left unset it defaults to **9090** as well and would conflict. [`flp-prometheus.yaml`](./flp-prometheus.yaml) sets **`metricsSettings.port: 9102`** so flow encode metrics are served separately.

## Flow metrics exposed by FLP (`flp-prometheus.yaml`)

All names use the encode prefix **`netobserv_`** (see `prom.prefix` in the file).

| Prometheus metric | Type | When populated | Main labels |
|-------------------|------|----------------|-------------|
| `netobserv_flow_bytes` | gauge | `Bytes` present on the flow | `SrcAddr`, `DstAddr`, `SrcPort`, `DstPort`, `Proto` |
| `netobserv_flow_packets` | gauge | `Packets` present | same as above |
| `netobserv_dns_latency_ms` | gauge | DNS tracking enabled, `DnsLatencyMs` present | `SrcAddr`, `DstAddr`, `DstPort`, `DnsFlagsResponseCode` |
| `netobserv_tcp_flow_rtt_ns` | gauge | RTT enabled, `TimeFlowRttNs` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto` |
| `netobserv_pkt_drop_packets` | gauge | Packet drop tracking enabled, `PktDropPackets` present | `SrcAddr`, `DstAddr`, `PktDropLatestDropCause` |
| `netobserv_ipsec_flow_bytes` | gauge | IPsec tracking enabled; `IPSecStatus` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `IPSecStatus`, `IPSecRetCode` |
| `netobserv_tls_flow_bytes` | gauge | TLS tracking enabled; `TLSVersion` and `Bytes` present | `SrcAddr`, `DstAddr`, `DstPort`, `Proto`, `TLSVersion`, `TLSCipherSuite` |

These are **gauges** reflecting the last value for each label set (with FLP `expiryTime: 5m` cleaning stale series). They are not counters; use aggregations such as `max_over_time` or `topk` rather than `rate()` unless you understand the semantics.

**Cardinality:** labels include IP addresses and ports. This is appropriate for a **local lab** only, not for a large production Prometheus.

**Packet drops:** the agent needs access to kernel tracepoints (typically **`/sys/kernel/debug`** mounted and sufficient privileges). See the main [README.md](../../../README.md) and [docs/config.md](../../../docs/config.md).

## Prometheus rules (`netobserv-rules.yml`)

[`prometheus.yml`](./prometheus.yml) loads [`netobserv-rules.yml`](./netobserv-rules.yml) via `rule_files`.

### Recording rules (precomputed series)

| Recorded metric | Expression (summary) |
|-----------------|----------------------|
| `netobserv:pkt_drop_packets:sum_by_cause` | `sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)` |
| `netobserv:tcp_flow_rtt:milliseconds` | `netobserv_tcp_flow_rtt_ns / 1e6` |
| `netobserv:dns_latency_ms:max_over_5m` | `max_over_time(netobserv_dns_latency_ms[5m])` |
| `netobserv:flow_bytes:top10` | `topk(10, netobserv_flow_bytes)` |
| `netobserv:flow_bytes:deriv_2m` | `deriv(netobserv_flow_bytes[2m])` (noisy; exploratory) |
| `netobserv:ipsec_flow_bytes:sum_by_status` | `sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)` |
| `netobserv:tls_flow_bytes:count_by_version` | `count by (TLSVersion) (netobserv_tls_flow_bytes)` |

In the Prometheus UI (**Graph**), you can query these names directly instead of typing the full expression.

### Example alert

| Alert | Condition | Meaning |
|-------|-----------|---------|
| `NetobservNoFlowMetrics` | `absent(netobserv_flow_bytes)` for 3m | No `netobserv_flow_bytes` series (scrape failure, agent stopped, or no flows exported yet). Check the agent, http://127.0.0.1:9102/metrics on the host, and Docker `host.docker.internal` reachability. |

View rule evaluation status under **Status → Rules** and firing alerts under **Alerts**.

## Ad hoc PromQL (raw FLP metrics)

```promql
{__name__=~"netobserv_.*"}
```

```promql
topk(10, netobserv_flow_bytes)
```

```promql
netobserv_tcp_flow_rtt_ns / 1e6
```

```promql
sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)
```

```promql
netobserv_ipsec_flow_bytes
```

```promql
topk(5, netobserv_tls_flow_bytes)
```

## Files in this directory

| File | Role |
|------|------|
| [`flp-prometheus.yaml`](./flp-prometheus.yaml) | `FLP_CONFIG`: `metricsSettings` + `encode/prom` pipeline after `preset-ingester` |
| [`prometheus.yml`](./prometheus.yml) | Prometheus scrape config + `rule_files` |
| [`netobserv-rules.yml`](./netobserv-rules.yml) | Recording rules and example alert |
| [`docker-compose.yaml`](./docker-compose.yaml) | Prometheus service and config mounts |
| [`run-example.sh`](./run-example.sh) | Starts Compose, exports env vars, runs the agent with `sudo -E` |

## Prometheus without Docker

Point a local `prometheus.yml` at `127.0.0.1:9102` instead of `host.docker.internal:9102`, and use the same `rule_files` entry with a path to [`netobserv-rules.yml`](./netobserv-rules.yml) on your machine.
15 changes: 15 additions & 0 deletions examples/direct-flp/prometheus-local/docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
services:
prometheus:
image: docker.io/prom/prometheus:v2.53.3
ports:
# Host UI/API; agent + embedded FLP keep 9090 and 9102 on the host.
- "9091:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml:ro
- ./netobserv-rules.yml:/etc/prometheus/netobserv-rules.yml:ro
command:
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.retention.time=2h
- --web.enable-lifecycle
extra_hosts:
- "host.docker.internal:host-gateway"
111 changes: 111 additions & 0 deletions examples/direct-flp/prometheus-local/flp-prometheus.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
# Flowlogs-pipeline config for EXPORT=direct-flp (ingest is provided by the agent).
# Exposes Prometheus metrics on :9102/metrics (see metricsSettings).
# Agent operational metrics stay on METRICS_SERVER_PORT (default 9090).
log-level: info
metricsSettings:
address: "0.0.0.0"
port: 9102
pipeline:
- name: encode_prom
follows: preset-ingester
parameters:
- name: encode_prom
encode:
type: prom
prom:
prefix: netobserv_
expiryTime: 5m
metrics:
- name: flow_bytes
type: gauge
help: Last observed byte count for a flow (high-cardinality labels; local lab only).
filters:
- key: Bytes
type: presence
valueKey: Bytes
labels:
- SrcAddr
- DstAddr
- SrcPort
- DstPort
- Proto
- name: flow_packets
type: gauge
help: Last observed packet count for a flow.
filters:
- key: Packets
type: presence
valueKey: Packets
labels:
- SrcAddr
- DstAddr
- SrcPort
- DstPort
- Proto
- name: dns_latency_ms
type: gauge
help: DNS request/response latency in milliseconds when DNS tracking is enabled.
filters:
- key: DnsLatencyMs
type: presence
valueKey: DnsLatencyMs
labels:
- SrcAddr
- DstAddr
- DstPort
- DnsFlagsResponseCode
- name: tcp_flow_rtt_ns
type: gauge
help: TCP smoothed RTT in nanoseconds when RTT tracking is enabled.
filters:
- key: TimeFlowRttNs
type: presence
valueKey: TimeFlowRttNs
labels:
- SrcAddr
- DstAddr
- DstPort
- Proto
- name: pkt_drop_packets
type: gauge
help: Kernel packet drops attributed to the flow when packet drop tracking is enabled.
filters:
- key: PktDropPackets
type: presence
valueKey: PktDropPackets
labels:
- SrcAddr
- DstAddr
- PktDropLatestDropCause
- name: ipsec_flow_bytes
type: gauge
help: Last observed bytes for flows where IPsec encryption was detected (ENABLE_IPSEC_TRACKING).
filters:
- key: Bytes
type: presence
- key: IPSecStatus
type: presence
valueKey: Bytes
labels:
- SrcAddr
- DstAddr
- DstPort
- Proto
- IPSecStatus
- IPSecRetCode
- name: tls_flow_bytes
type: gauge
help: Last observed bytes for flows where TLS metadata was captured (ENABLE_TLS_TRACKING).
filters:
- key: Bytes
type: presence
- key: TLSVersion
type: presence
valueKey: Bytes
labels:
- SrcAddr
- DstAddr
- DstPort
- Proto
- TLSVersion
- TLSCipherSuite
44 changes: 44 additions & 0 deletions examples/direct-flp/prometheus-local/netobserv-rules.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Recording + alert rules for the netobserv direct-flp / prom-local example.
# After reload, use Graph → metric name (e.g. netobserv:tcp_flow_rtt:milliseconds).

groups:
- name: netobserv-recording
interval: 15s
rules:
# sum of drop packets attributed to flows, grouped by last kernel drop cause
- record: netobserv:pkt_drop_packets:sum_by_cause
expr: sum by (PktDropLatestDropCause) (netobserv_pkt_drop_packets)

# TCP SRTT in milliseconds (source series is nanoseconds)
- record: netobserv:tcp_flow_rtt:milliseconds
expr: netobserv_tcp_flow_rtt_ns / 1e6

# per-series max DNS latency over 5m (requires ≥10s scrape; matches global interval)
- record: netobserv:dns_latency_ms:max_over_5m
expr: max_over_time(netobserv_dns_latency_ms[5m])

# up to 10 flows with largest last-reported byte gauge (high-cardinality labels preserved)
- record: netobserv:flow_bytes:top10
expr: topk(10, netobserv_flow_bytes)

# optional: rough “change” signal for byte gauges (can be noisy)
- record: netobserv:flow_bytes:deriv_2m
expr: deriv(netobserv_flow_bytes[2m])

- record: netobserv:ipsec_flow_bytes:sum_by_status
expr: sum by (IPSecStatus) (netobserv_ipsec_flow_bytes)

- record: netobserv:tls_flow_bytes:count_by_version
expr: count by (TLSVersion) (netobserv_tls_flow_bytes)
Comment on lines +31 to +32

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

count by counts series, not bytes — name mismatch.

count by (TLSVersion) (netobserv_tls_flow_bytes) yields the number of active series per TLS version, not a byte-related quantity. If the intent is byte volume per version, use sum by (TLSVersion) like the IPSec rule above; otherwise rename the record to something like netobserv:tls_flows:count_by_version to avoid confusion.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/netobserv-rules.yml` around lines 31 -
32, The metric name/semantics mismatch: the recording rule
"netobserv:tls_flow_bytes:count_by_version" uses the expression "count by
(TLSVersion) (netobserv_tls_flow_bytes)" which counts series rather than summing
bytes. Fix by either changing the expression to "sum by (TLSVersion)
(netobserv_tls_flow_bytes)" to produce byte volume per TLSVersion, or rename the
record to something like "netobserv:tls_flows:count_by_version" to reflect it is
a series count; update the recording rule name
"netobserv:tls_flow_bytes:count_by_version" and/or its expr accordingly.


- name: netobserv-alerts
rules:
# Fires when no netobserv flow byte series exist (scrape down, agent stopped, or no traffic yet).
- alert: NetobservNoFlowMetrics
expr: absent(netobserv_flow_bytes)
for: 3m
labels:
severity: warning
annotations:
summary: No netobserv_flow_bytes series scraped
description: Prometheus has not recorded netobserv_flow_bytes in 3m. Check agent, :9102/metrics, and host.docker.internal from the container.
13 changes: 13 additions & 0 deletions examples/direct-flp/prometheus-local/prometheus.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
global:
scrape_interval: 10s
evaluation_interval: 10s

rule_files:
- /etc/prometheus/netobserv-rules.yml

scrape_configs:
- job_name: netobserv-flowlogs-pipeline
metrics_path: /metrics
static_configs:
# From inside the Prometheus container, reach the host where the agent listens (9102).
- targets: ["host.docker.internal:9102"]
43 changes: 43 additions & 0 deletions examples/direct-flp/prometheus-local/run-example.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
#!/usr/bin/env bash
# Run NetObserv eBPF agent on the host with direct-flp + Prometheus encode, alongside
# a Prometheus instance in Docker that scrapes the embedded flowlogs-pipeline /metrics.
#
# Prerequisites:
# - Built agent: (cd repo root && make compile)
# - Docker with compose plugin
# - Caps: sudo, or CAP_BPF + CAP_PERFMON + CAP_NET_ADMIN (see README)
# - Packet drops: /sys/kernel/debug mounted read-write (often requires privileged / root)
# - IPsec/TLS: ENABLE_IPSEC_TRACKING / ENABLE_TLS_TRACKING (extra eBPF hooks; series only if traffic matches)
#
set -euo pipefail

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "${SCRIPT_DIR}/../../.." && pwd)"
AGENT_BIN="${REPO_ROOT}/bin/netobserv-ebpf-agent"

if [[ ! -x "${AGENT_BIN}" ]]; then
echo "Missing ${AGENT_BIN}. Run: cd ${REPO_ROOT} && make compile" >&2
exit 1
fi

cd "${SCRIPT_DIR}"
docker compose up -d

export EXPORT=direct-flp
export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
Comment on lines +24 to +27

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Compose stack leaks if the agent fails to start, and cat errors are swallowed.

  • docker compose up -d runs before any validation of flp-prometheus.yaml; if cat/agent fails later, Prometheus keeps running and the user has to tear it down manually.
  • Line 27 hits shellcheck SC2155: export FLP_CONFIG="$(cat ...)" masks cat's exit status, so a missing/empty config file won't abort the script despite set -e.
Proposed fix
-cd "${SCRIPT_DIR}"
-docker compose up -d
-
-export EXPORT=direct-flp
-export FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+cd "${SCRIPT_DIR}"
+
+FLP_CONFIG_CONTENT="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
+export FLP_CONFIG="${FLP_CONFIG_CONTENT}"
+export EXPORT=direct-flp
+
+docker compose up -d
+trap 'docker compose -f "${SCRIPT_DIR}/docker-compose.yaml" down' EXIT
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 27-27: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/direct-flp/prometheus-local/run-example.sh` around lines 24 - 27,
Validate the flp-prometheus.yaml before starting the compose stack and stop
masking cat's exit status: check that "${SCRIPT_DIR}/flp-prometheus.yaml" exists
and is non-empty (e.g. [ -s "${SCRIPT_DIR}/flp-prometheus.yaml" ] || { echo
"missing"; exit 1; }), then read it into FLP_CONFIG with a separate assignment
that preserves failures (FLP_CONFIG="$(cat "${SCRIPT_DIR}/flp-prometheus.yaml")"
|| exit 1) and then export FLP_CONFIG; only after that run docker compose up -d;
also add a trap on ERR/EXIT to run docker compose down (or a cleanup function)
so the compose stack is torn down if the agent fails to start.


export ENABLE_RTT=true
export ENABLE_PKT_DROPS=true
export ENABLE_DNS_TRACKING=true
export ENABLE_IPSEC_TRACKING=true
export ENABLE_TLS_TRACKING=true

# Embedded FLP serves Prometheus on 9102; keep agent metrics on 9090 (default).
export METRICS_SERVER_PORT=9090

echo "Prometheus UI: http://127.0.0.1:9091 (Graph: try netobserv:tcp_flow_rtt:milliseconds; Alerts: NetobservNoFlowMetrics)"
echo "Embedded FLP metrics: http://127.0.0.1:9102/metrics"
echo "Agent metrics: http://127.0.0.1:${METRICS_SERVER_PORT}/metrics"
echo "Starting agent (sudo)…"

exec sudo -E "${AGENT_BIN}"
Loading