Circle Cover Problem - PySpark Implementation

Greedy thresholding algorithm for the circle cover problem with 1/2(1-ε) approximation guarantee, implemented using Apache Spark for distributed computing.

Algorithm

The circle cover problem is a geometric variant of set cover: given a set of points P and candidate circles C, find k circles that cover the maximum number of points.

This implementation uses:

Greedy thresholding with logarithmic guesses for unknown OPT
Submodular maximization (coverage function is monotone submodular)
PySpark for parallelizing marginal gain computation

Quick Start

Local Mode (Windows/Linux)

# Create virtual environment
python -m venv .venv
.venv/Scripts/activate  # Windows
source .venv/bin/activate  # Linux

# Install dependencies
pip install -r requirements.txt

# Run locally with small dataset
python main.py --size small

# Run with visualization server (in separate terminals)
python viz_server.py  # Terminal 1
python main.py --ui-url http://localhost:8000 --size medium  # Terminal 2

Cluster Mode (Raspberry Pi)

# SSH to master node
ssh pi@raspberrypi

# Run on cluster
~/spark-env/bin/python ~/main.py --cluster --size medium

Raspberry Pi Cluster

Cluster Information

Node	Role	IP	Cores	Memory
raspberrypi	Master + Worker	192.168.1.223	4	6GB
applepi	Worker	192.168.1.250	4	6GB

Total Resources: 8 cores, 12GB memory

Web Interfaces

Service	URL
Spark Master UI	http://raspberrypi:8080
Spark Application UI	http://raspberrypi:4040 (when app running)

Cluster Management

Start Cluster

# Start master
ssh pi@raspberrypi "export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-arm64 && ~/spark/sbin/start-master.sh"

# Start worker on raspberrypi
ssh pi@raspberrypi "export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-arm64 && ~/spark/sbin/start-worker.sh spark://192.168.1.223:7077"

# Start worker on applepi
ssh pi@applepi "export JAVA_HOME=/usr/lib/jvm/java-21-openjdk-arm64 && ~/spark/sbin/start-worker.sh spark://192.168.1.223:7077"

Stop Cluster

# Stop all (from master)
ssh pi@raspberrypi "~/spark/sbin/stop-all.sh"

# Or stop individually
ssh pi@raspberrypi "~/spark/sbin/stop-worker.sh"
ssh pi@raspberrypi "~/spark/sbin/stop-master.sh"
ssh pi@applepi "~/spark/sbin/stop-worker.sh"

Check Status

# Check running Java processes
ssh pi@raspberrypi "jps"
ssh pi@applepi "jps"

# View logs
ssh pi@raspberrypi "tail -50 ~/spark/logs/spark-pi-org.apache.spark.deploy.master.Master-1-raspberrypi.out"

Deploy Application

# Copy script to Pi
scp main.py pi@raspberrypi:~/main.py

# Run on cluster (without visualization)
ssh pi@raspberrypi "~/spark-env/bin/python ~/main.py --cluster --size medium"

# Run with visualization (requires viz_server running)
ssh pi@raspberrypi "~/spark-env/bin/python ~/main.py --cluster --ui-url http://127.0.0.1:8000 --size large"

# Or run directly from the UI dashboard (recommended)
# Navigate to http://raspberrypi:8000 and click "Run New Simulation"

Visualization Dashboard

The project includes a real-time web visualization dashboard with interactive controls.

Start the Server:

ssh pi@raspberrypi
# Start the visualization server (runs on port 8000)
~/spark-env/bin/python ~/viz_server.py

Run the Algorithm with Visualization:

# In a separate terminal, run the algorithm with the --ui-url flag
ssh pi@raspberrypi "~/spark-env/bin/python ~/main.py --cluster --ui-url http://127.0.0.1:8000 --size medium"

Access the UI: Open http://raspberrypi:8000 in your browser.

Features:

Real-time Visualization: See points and circles as they are covered during exploration.
Final Solution Highlighting: Best solution circles rendered in gold with enhanced glow, distinguishing them from intermediate guesses (cyan).
Dataset Size Selection: Choose from small (2K), medium (25K), or large (100K) point datasets.
Smart Sampling: Automatic stratified/uniform sampling for large datasets to maintain UI performance.
Cluster Control: Start new simulations directly from the UI with customizable dataset sizes.
Live Worker Stats: Monitor core/memory usage of connected nodes in real-time.
Algorithm Metrics: Track marginal gain and total coverage graphs across all guesses.

What You'll See:

The visualization progresses through multiple threshold guesses (j=0 to log k):

Exploration phase - Cyan circles appear as the algorithm tests different thresholds
Final result - Screen clears and the best solution is displayed in gold
Status shows: FINAL SOLUTION: X% Coverage with the winning guess number

Configuration

Dataset Size

Choose from three predefined configurations via command line:

# Small dataset (2K points, 500 circles, 8 clusters)
python main.py --size small

# Medium dataset (25K points, 2K circles, 20 clusters)
python main.py --size medium

# Large dataset (100K points, 5K circles, 50 clusters)
python main.py --size large

Algorithm Parameters

Edit in main.py (applies to all sizes):

K = 10              # Maximum circles to select
EPSILON = 0.2       # Approximation parameter (affects # of guesses)
BOUNDS = (0, 0, 1000, 1000)    # Spatial bounds
RADIUS_RANGE = (30, 100)        # Circle radius range

Cluster Settings

SPARK_MASTER_URL = "spark://192.168.1.223:7077"

Or use --cluster flag to enable cluster mode (auto-detects Pi cluster environment).

Project Structure

PI-Cluster/
├── main.py              # Main algorithm implementation
├── viz_server.py        # Visualization backend (FastAPI)
├── ui/                  # Frontend assets
│   ├── index.html       # Dashboard UI
│   ├── style.css        # Glassmorphism styles
│   └── script.js        # Visualization logic
├── requirements.txt     # Python dependencies
├── README.md            # This file
└── CLUSTER_SETUP.md     # Detailed cluster setup guide

Requirements

Local Development

Python 3.11+
Java 17+ (for Spark)
See requirements.txt

Raspberry Pi Cluster

Raspberry Pi 5 (8GB RAM recommended)
Raspberry Pi OS (Debian Trixie)
Java 21
Python 3.13

References

Apache Spark Documentation
Submodular maximization for set cover problems
Greedy thresholding algorithm with logarithmic guesses

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
docs		docs
ui		ui
.gitignore		.gitignore
CLUSTER_SETUP.md		CLUSTER_SETUP.md
README.md		README.md
deploy.ps1		deploy.ps1
deploy.sh		deploy.sh
main.py		main.py
requirements.txt		requirements.txt
viz_server.py		viz_server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Circle Cover Problem - PySpark Implementation

Algorithm

Quick Start

Local Mode (Windows/Linux)

Cluster Mode (Raspberry Pi)

Raspberry Pi Cluster

Cluster Information

Web Interfaces

Cluster Management

Start Cluster

Stop Cluster

Check Status

Deploy Application

Visualization Dashboard

Configuration

Dataset Size

Algorithm Parameters

Cluster Settings

Project Structure

Requirements

Local Development

Raspberry Pi Cluster

References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Circle Cover Problem - PySpark Implementation

Algorithm

Quick Start

Local Mode (Windows/Linux)

Cluster Mode (Raspberry Pi)

Raspberry Pi Cluster

Cluster Information

Web Interfaces

Cluster Management

Start Cluster

Stop Cluster

Check Status

Deploy Application

Visualization Dashboard

Configuration

Dataset Size

Algorithm Parameters

Cluster Settings

Project Structure

Requirements

Local Development

Raspberry Pi Cluster

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages