Skip to content

BPFLNALCR/cyberwatch

Repository files navigation

cyberWatch

Autonomous internet measurement and topology mapping node.

Table of Contents

Overview

cyberWatch runs active network measurements (traceroute, scamper, MTR/ping), stores hop-by-hop results in PostgreSQL, enriches hops with ASN metadata, and projects an AS-level graph into Neo4j. A Redis-backed queue feeds Python workers so probing can be continuous or on-demand.

DNS activity from a local resolver (e.g., Pi-hole) can be turned into measurement targets, allowing the system to follow the destinations that matter to the vantage point. A small FastAPI-based API, a looking-glass style UI, and Grafana dashboards expose paths, AS relationships, latency, and hopcount trends.

Key Use Cases

Features

Active Measurement

  • traceroute/scamper with hop-by-hop parsing; MTR endpoint for ad-hoc runs when mtr is installed.
  • Targets pulled from Redis queue; results inserted into PostgreSQL with hop RTTs and raw output.
  • Multi-worker architecture: Scalable traceroute processing with 2-4 parallel workers (configurable).
  • Rate limiting: Token bucket rate limiter prevents network abuse (default: 30 traceroutes/min per worker).
  • Automatic remeasurement: Periodic re-measurement of targets keeps data fresh (default: 24-hour interval).

Enrichment & Topology

  • Multi-source ASN enrichment: Gathers data from Team Cymru WHOIS, PeeringDB, RIPE RIS, ip-api.com, and ipinfo.io for maximum coverage.
  • Dedicated ASN table: Stores comprehensive metadata including organization names, countries, neighbor counts, peering policies, facility counts, and statistics.
  • Automatic ASN discovery: Intelligently expands topology by sampling IPs from announced prefixes of interesting ASNs.
  • Neo4j AS graph builder that merges observed AS adjacencies with edge weights (observed_count, min/max RTT, last_seen).
  • Enhanced API responses: ASN endpoints return rich metadata from all sources with neighbor lists from Neo4j graph.

DNS Integration

  • Optional Pi-hole API or log tail ingestion; filters for suffixes (.local, .lan), qtypes (PTR), client allow/deny, max domain length.
  • Domains resolved to A/AAAA (configurable max IPs) and automatically enqueued for traceroute measurement.
  • Stored in dns_queries and dns_targets tables with full analytics.

APIs & Looking Glass

  • FastAPI service with endpoints for traceroute/MTR, measurements, hops, target enqueue/list, ASN detail (with full enrichment), graph neighbors/path, DNS analytics.
  • UI pages for traceroute, ASN lookup (showing rich metadata), and graph neighbors using the same API.

Dashboards

  • Grafana JSON dashboards for latency, hopcount, and ASN performance sourced from PostgreSQL (measurements, hops, targets, asns).

Architecture Overview

Core runtime is a Debian VM running Python services plus Redis, PostgreSQL, and Neo4j. Systemd units manage API, UI, enrichment loop, DNS collector, measurement workers, and remeasurement scheduler. Data flow:

DNS logs/API → DNS collector → Redis target queue → measurement workers (2-4) → PostgreSQL
                                ↓                          ↓                          ↓
                            targets table          ASN enrichment              hops/measurements
                                ↓                  (5+ sources)                        ↓
                      Worker rate limiting        ASN metadata table        Neo4j graph builder
                         (30/min default)             (asns)                         ↓
                                ↓                          ↓                    AS topology
                    Remeasurement scheduler    ASN IP discovery                     ↓
                       (24hr interval)         (prefix sampling)        API / UI / Grafana dashboards

Key Components:

  • DNS Collector: Captures PiHole queries, resolves IPs, enqueues to Redis
  • Worker Pool (2-4 instances): Dequeues targets, runs traceroutes with rate limiting and concurrency control
  • Enrichment Service: Polls unenriched hops, enriches from 5+ sources (Team Cymru, PeeringDB, RIPE RIS, ip-api, ipinfo), populates ASN table
  • ASN Expander: Periodically discovers IPs within interesting ASNs by sampling from announced prefixes
  • Graph Builder: Converts enriched measurements into Neo4j AS topology with ROUTE edges
  • Remeasurement Scheduler: Re-enqueues stale targets for fresh measurements

See architecture.md for the full design and phased goals.

Data Model & What cyberWatch Shows You

  • Measurements: target, tool used, timestamps, success, raw output, enrichment status, graph build status.
  • Hops: hop number, IP, RTT ms, ASN, prefix, org, country (enriched from multiple sources).
  • ASNs: Dedicated table with comprehensive metadata - org name, country, neighbor count, prefix count, PeeringDB data (facility count, peering policy, traffic levels, IRR AS-SET), measurement statistics, timestamps.
  • DNS-derived targets: domains/IPs with first/last seen, query counts, last client/qtype.
  • AS graph edges (Neo4j): AS nodes with org/country, ROUTE edges holding observed_count, min/max RTT, last_seen.

How it appears:

  • API: /measurements/latest?target=1.1.1.1, /measurements/hops/{id}, /traceroute/run, /asn/{asn} (returns full enrichment), /graph/path?src_asn=64512&dst_asn=15169, /dns/top-domains, /dns/top-asns.
  • UI: traceroute form shows JSON hops; ASN view shows org/country/neighbors/facilities/peering policy; graph view lists neighbor edges.
  • Grafana: latency time series (mean, P95), hopcount distribution and over time, RTT by ASN and observed edge counts.

Installation (Debian)

Assumed platform: Debian 12+ (VM/LXC on Proxmox or similar). The installer is idempotent and will prompt before applying schemas.

  1. Clone the repo and enter it:
git clone <repo-url>
cd cyberwatch
  1. Run the installer (installs system packages, creates venv, installs Python deps, applies schemas if approved, installs systemd units):
./install-cyberWatch.sh
  1. Installer actions (from install-cyberWatch.sh):

Configuration

  • DNS collector config lives at /etc/cyberwatch/dns.yaml (installed from config/cyberwatch_dns.example.yaml). Key fields:
    • enabled: toggle collector.
    • source: pihole (HTTP API) or logfile (tail FTL logs).
    • poll_interval_seconds: per-source polling cadence.
    • filters: suffix ignore list (.local, .lan), qtypes to drop (e.g., PTR), clients to ignore, max_domain_length.
    • dns_resolution: enable/disable resolution, timeout, max_ips_per_domain.
  • Core environment variables (loaded by systemd from /etc/cyberwatch/cyberwatch.env):
    • CYBERWATCH_PG_DSN, CYBERWATCH_REDIS_URL (queue), NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD for API/enrichment/collector.
    • CYBERWATCH_API_BASE for the UI to reach the API.
    • CYBERWATCH_DNS_CONFIG to point the collector to a non-default config path.
  • Runtime settings stored in PostgreSQL (configured during installation, adjustable via SQL or API):
    • Worker settings (worker_settings key):
      • rate_limit_per_minute: Max traceroutes per minute per worker (default: 30)
      • max_concurrent_traceroutes: Max parallel traceroutes per worker (default: 5)
      • worker_count: Number of worker instances (default: 2)
    • Enrichment settings (enrichment_settings key):
      • poll_interval_seconds: Enrichment polling frequency (default: 10)
      • batch_size: Hops to process per batch (default: 200)
      • asn_expansion_enabled: Enable automatic ASN IP discovery (default: true)
      • asn_expansion_interval_minutes: How often to expand ASNs (default: 60)
      • asn_min_neighbor_count: Only expand ASNs with this many neighbors (default: 5)
      • asn_max_ips_per_asn: Max IPs to sample per ASN (default: 10)
    • Remeasurement settings (remeasurement_settings key):
      • enabled: Enable periodic remeasurement (default: true)
      • interval_hours: How often to remeasure targets (default: 24)
      • batch_size: Targets to process per batch (default: 100)
      • targets_per_run: Max targets to remeasure per cycle (default: 500)

Adjusting Settings:

-- Increase worker rate limits
UPDATE settings 
SET value = '{"rate_limit_per_minute": 60, "max_concurrent_traceroutes": 10}'::jsonb 
WHERE key = 'worker_settings';

-- Adjust ASN expansion
UPDATE settings 
SET value = '{"asn_expansion_enabled": true, "asn_expansion_interval_minutes": 30, "asn_max_ips_per_asn": 20}'::jsonb 
WHERE key = 'enrichment_settings';

-- Then restart affected services
sudo systemctl restart 'cyberWatch-worker@*'
sudo systemctl restart cyberWatch-enrichment

Logging Configuration

cyberWatch includes comprehensive structured logging in JSONL format for all components. Logging is enabled by default and can be configured via environment variables:

Environment Variables:

  • CYBERWATCH_LOG_LEVEL: Set log level (DEBUG, INFO, WARNING, ERROR, CRITICAL). Default: INFO
  • CYBERWATCH_LOG_FILE: Path to JSONL log file. Default: logs/cyberwatch.jsonl
  • CYBERWATCH_LOG_MAX_BYTES: Maximum bytes per log file before rotation. Default: 104857600 (100MB)

Log Output Format: Logs are written in JSON Lines format with structured fields:

{"timestamp": "2025-12-25T10:30:45.123456Z", "level": "INFO", "component": "api", "logger": "cyberwatch.api", "message": "Request completed", "request_id": "abc-123", "method": "POST", "path": "/traceroute/run", "status_code": 200, "duration": 1234.56, "outcome": "success"}

What's Logged:

  • API: All HTTP requests with method, path, query params, status codes, duration, and unique request IDs for correlation
  • Workers: Task processing, subprocess execution (traceroute/scamper commands), stdout/stderr, exit codes, parsing results
  • Database: Connection pool creation, query execution timing, transaction outcomes, bulk operation row counts
  • DNS Collector: Collection cycles, query filtering, DNS resolution results, target enqueueing
  • Enrichment: ASN lookups, PeeringDB API calls, batch processing progress, graph building operations
  • Errors: Full exception tracebacks with context for debugging

Log Rotation: Logs automatically rotate when reaching CYBERWATCH_LOG_MAX_BYTES (default 100MB), keeping 10 backup files by default. Old logs are named cyberwatch.jsonl.1, cyberwatch.jsonl.2, etc.

Example Configuration in Systemd: Add to your systemd unit files (e.g., /etc/systemd/system/cyberWatch-api.service):

[Service]
Environment="CYBERWATCH_LOG_LEVEL=DEBUG"
Environment="CYBERWATCH_LOG_FILE=/var/log/cyberwatch/api.jsonl"
Environment="CYBERWATCH_LOG_MAX_BYTES=52428800"

Viewing Logs:

# View all logs (JSONL format)
cat logs/cyberwatch.jsonl

# Follow logs in real-time
tail -f logs/cyberwatch.jsonl

# Parse and pretty-print JSON logs
tail -f logs/cyberwatch.jsonl | jq '.'

# Filter by component
grep '"component":"api"' logs/cyberwatch.jsonl | jq '.'

# Filter by log level
grep '"level":"ERROR"' logs/cyberwatch.jsonl | jq '.'

# Find all logs for a specific request
grep '"request_id":"abc-123"' logs/cyberwatch.jsonl | jq '.'

# Find all failed traceroute attempts
grep '"outcome":"error"' logs/cyberwatch.jsonl | grep traceroute | jq '.'

Security Note: Sensitive data (passwords, API tokens) is automatically redacted from logs. You'll see ***REDACTED*** in place of these values.

Running cyberWatch

Systemd (installed by the script)

  • API: sudo systemctl status|start|stop cyberWatch-api.service (FastAPI on port 8000).
  • UI: sudo systemctl status|start|stop cyberWatch-ui.service (Uvicorn on port 8080).
  • Enrichment loop: sudo systemctl status|start|stop cyberWatch-enrichment.service (ASN enrichment + graph builder + ASN expander).
  • DNS collector (optional): sudo systemctl status|start|stop cyberWatch-dns-collector.service.
  • Workers: sudo systemctl status|start|stop 'cyberWatch-worker@*.service' (measurement workers @1, @2, etc.).
  • Remeasurement: sudo systemctl status|start|stop cyberWatch-remeasure.service (periodic target remeasurement).

Scaling Workers:

# Start additional worker instances
sudo systemctl start cyberWatch-worker@3.service
sudo systemctl enable cyberWatch-worker@3.service

# Check all workers
systemctl list-units 'cyberWatch-worker@*'

# View worker logs
sudo journalctl -u 'cyberWatch-worker@*' -f

Monitoring:

# Check Redis queue depth
redis-cli LLEN cyberwatch:targets

# View all logs
sudo journalctl -u 'cyberWatch-*' -f

# Check database stats
psql -U cyberwatch -d cyberwatch -c "
SELECT 
  (SELECT COUNT(*) FROM measurements) as measurements,
  (SELECT COUNT(*) FROM asns) as asns,
  (SELECT COUNT(*) FROM hops WHERE asn IS NOT NULL) as enriched_hops
"

Manual/dev mode

# Activate virtualenv
source .venv/bin/activate
# Run API (reload optional)
uvicorn cyberWatch.api.server:app --host 0.0.0.0 --port 8000
# Run UI
CYBERWATCH_API_BASE=http://localhost:8000 uvicorn cyberWatch.ui.server:app --host 0.0.0.0 --port 8080
# Run enrichment loop
python -m cyberWatch.enrichment.run_enrichment
# Run DNS collector
python -m cyberWatch.collector.dns_collector --config /etc/cyberwatch/dns.yaml
# Run a worker manually
python -m cyberWatch.workers.worker
# Run remeasurement scheduler
python -m cyberWatch.scheduler.remeasure

Using the API and UI

Example API calls:

# On-demand traceroute
curl -X POST http://localhost:8000/traceroute/run -H "Content-Type: application/json" -d '{"target":"8.8.8.8"}'

# Latest measurement for a target
curl "http://localhost:8000/measurements/latest?target=8.8.8.8"

# Hops for a measurement
curl http://localhost:8000/measurements/hops/1

# Enqueue a target
curl -X POST http://localhost:8000/targets/enqueue -H "Content-Type: application/json" -d '{"target":"1.1.1.1","source":"api"}'

# ASN detail
curl http://localhost:8000/asn/13335

# Graph neighbors
curl http://localhost:8000/graph/neighbors/13335

# Shortest AS path
curl "http://localhost:8000/graph/path?src_asn=13335&dst_asn=15169"

# DNS analytics
curl http://localhost:8000/dns/top-domains
curl http://localhost:8000/dns/top-asns

UI (serving on http://<host>:8080):

  • Traceroute: enter target, JSON hops returned.
  • ASN Explorer: enter ASN, see org/country and neighbor list.
  • Graph View: list neighbors for an ASN with edge stats.

Grafana Dashboards

Dashboard JSON lives in grafana/dashboards:

Import into Grafana and configure a PostgreSQL datasource pointing at the cyberWatch database. Queries expect tables targets, measurements, and hops with timestamps in started_at and RTT in rtt_ms.

Uninstallation

Use uninstall-cyberWatch.sh:

./uninstall-cyberWatch.sh            # prompts before dropping tables or removing DNS config
./uninstall-cyberWatch.sh --purge    # additionally purges redis-server, postgresql-client, traceroute, scamper

What it does:

  • Stops/disables systemd units (API, UI, enrichment, DNS collector) and removes their unit files.
  • Removes .venv and cleans /var/lib/cyberWatch if present.
  • Optional prompt to drop PostgreSQL tables (dns_queries, dns_targets, hops, measurements, targets).
  • Optional prompt to remove /etc/cyberwatch/dns.yaml.

Roadmap / Future Work

From architecture.md:

  • Privacy hardening for DNS-derived targets (hashing/anonymization), TLS, and access controls.
  • Richer scheduling/rate limiting (✓ implemented) and expanded probe set (MTR/ping integration beyond ad-hoc).
  • Broader metadata ingestion (BGP/IXP datasets, RouteViews, RIS) to deepen the AS graph.
  • Monitoring/security hardening (network isolation, auth for UI/Grafana, encrypted channels).
  • AS relationship classification (peer, transit, customer).
  • Geographic diversity and latency anomaly detection.

Recently Completed:

  • ✅ Multi-worker architecture with scalable traceroute processing
  • ✅ Rate limiting and concurrency control for workers
  • ✅ Multi-source ASN enrichment (Team Cymru, PeeringDB, RIPE RIS, ip-api, ipinfo)
  • ✅ Dedicated ASN metadata table with comprehensive fields
  • ✅ Automatic ASN IP discovery via prefix sampling
  • ✅ Periodic remeasurement of stale targets
  • ✅ Enhanced API responses with full ASN enrichment data

License

License: TBD (no LICENSE file present).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors