Run Apache Spark workloads seamlessly on Armada, a multi-cluster Kubernetes batch scheduler
armada-spark is an open-source integration designed to streamline deployment and management of Apache Spark workloads on Armada. It provides preconfigured Docker images, tooling for efficient image management, and example workflows to simplify local and production deployments.
- Java 8/11/17
- Scala 2.12/2.13
- Apache Maven 3.9.6+
- (Optional) kind for local clusters
- An accessible Armada Server and Lookout endpoint (check Armada Operator for the Quickstart guide)
By default, the project targets Spark 3.5.3 and Scala 2.13.15. To change versions:
./scripts/set-version.sh <spark-version> <scala-version>Example:
./scripts/set-version.sh 3.5.3 2.13.15After setting your desired Spark and Scala versions, build the Armada Spark project with Maven by running the following command:
mvn clean packageOnce your project is built, create the Docker image using:
./scripts/createImage.sh [-i image-name] [-m armada-master-url] [-q armada-queue] [-l armada-lookout-url]Options:
| Flag | Description | Example |
|---|---|---|
-i |
Docker image name | spark:armada |
-m |
Armada master URL | armada://localhost:30002 |
-q |
Armada queue | default |
-l |
Armada Lookout URL | http://localhost:30000 |
-p |
Include python | |
-h |
Display help |
To simplify, you may store these values in scripts/config.sh:
export IMAGE_NAME="spark:armada"
export ARMADA_MASTER="armada://localhost:30002"
export ARMADA_QUEUE="default"
export ARMADA_LOOKOUT_URL="http://localhost:30000"
export INCLUDE_PYTHON=trueWe recommend using kind for local testing.
If you are using the Armada Operator Quickstart, it is already based on kind.
Run the following command to load the Armada Spark image into your local kind cluster:
kind load docker-image $IMAGE_NAME --name armada
Before submitting a pull request, please ensure that your code adheres to the project's coding standards and passes all tests.
To run the tests, use the following command:
mvn testTo check the code for linting issues, use the following command:
mvn spotless:checkTo automatically apply linting fixes, use:
mvn spotless:applyMake sure that the SparkPi job successfully runs on your Armada cluster before submitting a pull request.
The project includes a ready-to-use SparkPi job to test your setup:
./scripts/submitSparkPi.shThis job leverages the same configuration parameters (ARMADA_MASTER, ARMADA_QUEUE, ARMADA_LOOKOUT_URL) as the scripts/config.sh script.