This repository provides the following preserved starter templates updated for kedro==1.3.1.
sklearn-iristrains a Logistic Regression model using Scikit-learn.sklearn-mlflow-irisadds experiment tracking feature using MLflow.
Pipeline visualized by Kedro-viz
Iris dataset is included and used by default.
- Modification: for each species, setosa is encoded to 0, versicolor is encoded to 1, and virginica samples were removed.
- Split: for each species, the first 25 samples are included in
train.csv, and the last 25 samples are included intest.csv.
-
Install dependencies.
pip install "kedro==1.3.1" pandas scikit-learn -
Generate your Kedro starter project from
sklearn-irisdirectory.kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-iris
As explained in the Kedro documentation, enter
project_name,repo_name, andpython_package.Note: as your Python package name, choose a unique name and avoid a generic name such as
testorsklearnused by another package. You can see the list of importable packages by runningpython -c "help('modules')". -
Change the current directory to the generated project directory.
cd /path/to/project/directory -
Install project dependencies and run the project.
pip install -r requirements.txt kedro run
- Download Kaggle Titanic dataset
- Replace
train.csvandtest.csvin/path/to/project/directory/data/01_rawdirectory - Modify
/path/to/project/directory/conf/base/parameters.ymlto set parameters appropriate for the dataset (commented out by default)
This template integrates MLflow into Kedro using PipelineX. Even without writing MLflow code, you can:
- configure MLflow Tracking
- log inputs and outputs of Python functions set up as Kedro nodes as parameters (for example, features used to train the model) and metrics (for example, F1 score)
- log execution time for each Kedro node and dataset loading/saving as metrics
- log artifacts such as models, execution time Gantt charts visualized by Plotly, and
parameters.yml
In this template, MLflow logging is configured in Python code at src/<python_package>/hooks.py.
See here for details.
-
Install dependencies.
pip install "kedro==1.3.1" pandas scikit-learn mlflow "pipelinex==0.8.0" plotly
-
Generate your Kedro starter project from
sklearn-mlflow-irisdirectory.kedro new --starter https://github.com/Minyus/kedro-starters-sklearn.git --directory sklearn-mlflow-iris
-
Follow the same steps as the
sklearn-iristemplate.
To access the MLflow web UI, launch the MLflow server.
mlflow server --host 127.0.0.1 --port 8080 --backend-store-uri sqlite:///mlruns/sqlite.db --default-artifact-root ./mlruns
Logged metrics shown in MLflow's UI
Gantt chart for execution time, generated using Plotly, shown in MLflow's UI
- Both starters preserve the repo's original Iris-focused examples.
- The MLflow starter keeps the PipelineX-based hook integration, pinned to
pipelinex==0.8.0.