An end-to-end decision system for logistics planning that combines demand forecasting, stochastic simulation, and network optimization.
The project is designed to showcase production-oriented applied science and engineering skills at the intersection of:
- Operations Research
- Machine Learning
- Data Engineering
- MLOps
- API-based deployment
Build a scalable logistics decision engine that can:
- Generate or ingest historical shipment and demand data
- Forecast future demand — independently per destination
- Simulate uncertain logistics scenarios
- Optimize origin-destination flows under capacity and cost constraints
- Expose the full pipeline through an API
This repository reflects how real-world planning systems are built: not only with mathematical models, but also with robust data pipelines, modular software design, and deployable services.
graph TD
subgraph "Data Layer"
RAW[Raw Data / Synthetic Generation] --> PROC[Data Processing]
PROC --> DF[Polars DataFrame]
end
subgraph "Forecasting Layer"
DF --> PART[Partition by Destination]
PART --> |"dest A"| WA[Worker A]
PART --> |"dest B"| WB[Worker B]
PART --> |"dest N"| WN[Worker N]
subgraph "Per-Destination Pipeline (parallel via joblib)"
WA --> SPLIT[Train/Test Split]
SPLIT --> FIT[Fit Models]
FIT --> EVAL[Evaluate]
EVAL --> SEL[Select Best Model]
end
SEL --> FR[ForecastResult per Destination]
end
subgraph "Optimization Layer"
FR --> AGG[Aggregate Demand]
AGG --> OPT[Min-Cost Transportation LP]
OPT --> FLOW[Optimal Flow Allocation]
end
The forecasting system uses a local model architecture: each destination gets its own independently trained, evaluated, and selected model. This captures local demand patterns (seasonality, trend, volatility) that a single global model cannot.
Input DataFrame (date, destination_id, demand)
│
├── Partition by destination_id
│
├── For each destination (parallelizable):
│ ├── Sort by date
│ ├── Split train/test (chronological)
│ ├── For each model in registry:
│ │ ├── Fit on train
│ │ ├── Predict on test
│ │ └── Evaluate (WAPE, MAE, RMSE, MAPE, MSE)
│ └── Select best model (minimize configurable metric)
│
└── Aggregate results → AggregatedPipelineResult
The optimization layer solves minimum-cost transportation problems to allocate supply from origins to destinations. It supports two modes through a unified OptimizerInterface:
Single-Period Optimization — answers: "Given today's demand, how should we allocate supply right now?"
Inputs:
- demand_df: [destination_id, demand] (point-in-time demand)
- origins_df: [origin_id, daily_capacity]
- lanes_df: [origin_id, destination_id, unit_cost]
Objective: minimize Σ unit_cost(o,d) × flow(o,d)
Subject to:
- Demand satisfaction: Σ_o flow(o,d) ≥ demand(d) ∀ destinations
- Capacity limits: Σ_d flow(o,d) ≤ capacity(o) ∀ origins
- Non-negativity: flow(o,d) ≥ 0
Output: OptimizationResult (flows + total_cost)
Multi-Period Optimization — answers: "Over the next N days, how should we ship and store inventory to minimize total cost?"
Inputs:
- demand_ts: [destination_id, date, demand] (time-indexed demand)
- origins_df: [origin_id, daily_capacity]
- lanes_df: [origin_id, destination_id, unit_cost]
- destinations_df: [destination_id, holding_cost]
- planning_horizon: [date_1, date_2, ..., date_T]
- initial_inventory: {destination_id: quantity}
Objective: minimize Σ unit_cost(o,d) × flow(o,d,t) + Σ holding_cost(d) × inventory(d,t)
Subject to:
- Inventory balance: inv(d,t) = inv(d,t-1) + inflow(d,t) - demand(d,t)
- Capacity limits: Σ_d flow(o,d,t) ≤ capacity(o) ∀ origins, periods
- Non-negativity: flow(o,d,t) ≥ 0, inv(d,t) ≥ 0
Output: MultiPeriodResult (time-indexed flows + inventory levels + total_cost)
The key difference: single-period treats each day independently (myopic), while multi-period jointly optimizes across the entire horizon, trading off shipping costs against holding costs and anticipating future demand.
- Synthetic logistics data generation (dedicated
scripts/synthetic_data.pymodule) - Data processing with Polars via module-level validation functions
- Efficient storage in Parquet format
- Explicit
__all__exports in all packages
- Per-destination model training — one model per destination, independently selected
- Model Registry — factory pattern for dynamic model instantiation
- Unified ModelSelector — single implementation supporting both DataFrame and tuple-list inputs, with NaN handling and first-in-order tiebreaking
- Supported models: Naive, Seasonal Naive, Rolling Window (Moving Average), ETS, ARIMA/SARIMAX
- Evaluation: WAPE, MAE, RMSE, MAPE, MSE per destination per model (pure, side-effect-free)
- Model selection: automatic best-model selection per destination by configurable metric
- Pipeline Protocol:
PipelineProtocol(structural subtyping) — bothForecastingPipelineandPerDestinationPipelineconform - Parallel execution: joblib-based parallelism across destinations (configurable workers)
- Fault tolerance: individual destination failures don't block the pipeline
- Reproducibility: deterministic results regardless of row ordering or parallelism level
- Persistence interface: abstract storage layer (ready for S3, database, filesystem)
- Single-period: minimum-cost transportation LP — allocate supply to meet demand now
- Multi-period: joint optimization over a planning horizon with inventory tracking and holding costs
- Unified
OptimizerInterfacedispatches to the appropriate solver mode - Shared validation module (
optimization.validation) — common checks used by both optimizers - OR-Tools backend (GLOP for LP, CBC for MIP)
- Capacity-constrained origin-to-destination flow assignment
- Pre-solve feasibility checks (unreachable destinations, insufficient capacity, negative costs, non-positive capacities)
- Integration of forecast-derived demand into downstream optimization
SimulationInterfaceABC withSimulationResultdataclass- Ready for event-driven simulation of shipment arrivals, delays, and processing
- Stochastic demand generation
- Scenario analysis under uncertainty
APIInterfaceABC withforecastandoptimizeabstract methods- Ready for FastAPI endpoints for simulation, forecasting, and optimization
| Category | Tools |
|---|---|
| Language | Python 3.11+ |
| DataFrames | Polars |
| Optimization | OR-Tools (GLOP, CBC) |
| Statistical Models | statsmodels (ETS, ARIMA) |
| Metrics | scikit-learn |
| Parallelism | joblib |
| Numerics | NumPy |
| Visualization | Matplotlib |
| Configuration | PyYAML |
| Testing | pytest, Hypothesis (property-based testing) |
# Clone and setup
git clone https://github.com/<your-username>/decision-intelligence-logistics-engine.git
cd decision-intelligence-logistics-engine
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Run the full pipeline demo
python scripts/example_end_to_end_pipeline.py
# Run tests
python -m pytest tests/ -vRunning the per-destination pipeline on synthetic data with 4 destinations:
Destination D01 -> Best model: seasonal_forecaster (WAPE: 0.027)
Destination D02 -> Best model: ma_7_forecaster (WAPE: 0.063)
Destination D03 -> Best model: ma_7_forecaster (WAPE: 0.180)
Destination D04 -> Best model: ma_7_forecaster (WAPE: 0.067)
Each destination independently selects the model best suited to its demand pattern.
- Explicit destination isolation — no data leakage between destinations
- No global model selection — each destination has its own best model
- Row-order independence — results are deterministic regardless of input ordering
- Fault tolerance — one destination's failure doesn't block others
- Open-closed architecture — add new models (LightGBM, Prophet, DeepAR) without modifying pipeline code
- Composition over inheritance — module-level functions and protocols over deep class hierarchies
- Explicit contracts —
Protocol,ABC, and__all__declare what is public - Pure functions where possible — no hidden state mutation during computation
- Property-based testing — 13 formal correctness properties verified with Hypothesis
The project uses pytest with Hypothesis for property-based testing:
python -m pytest tests/ -v
# 188 passedKey correctness properties verified:
- Data isolation between destinations
- Temporal split correctness (no future leakage)
- Row-order independence
- Model selection minimality with tiebreaking
- Fault tolerance completeness
- Determinism across executions
- Pipeline protocol conformance
- FastAPI endpoints for end-to-end execution (interface defined via
APIInterface) - Stochastic simulation layer implementation (interface defined via
SimulationInterface) - MLflow experiment tracking
- Docker support
- ML model integration (LightGBM, XGBoost, Prophet)
- Hierarchical forecasting
- Performance benchmarking
- Visualization config support (show/save via YAML)
Christian Piermarini Applied Scientist / Operations Research / Machine Learning