A PyTorch implementation of L0 regularization based on Louizos, Welling, & Kingma (2017), designed for survey calibration and sparse regression.
pip install l0-pythonFor development:
git clone https://github.com/PolicyEngine/L0.git
cd L0
pip install -e .[dev]The original Hard Concrete formulation uses temperature (β) during training to control the sharpness of stochastic gates. At test time, there's a design choice: whether to include temperature in the deterministic gate computation.
We include temperature at test time:
# Our approach: include temperature
z = sigmoid(log_alpha / beta) * (zeta - gamma) + gamma
# Alternative: omit temperature
z = sigmoid(log_alpha) * (zeta - gamma) + gammaIncluding temperature produces sharper 0/1 decisions, which we find beneficial for achieving clean sparsity in our applications. See examples/sparse_regression_demo.py for a demonstration on a 4-variable regression problem.
This package was developed for PolicyEngine's survey calibration, where we select a sparse subset of survey households while matching population targets.
import numpy as np
from scipy import sparse as sp
from l0.calibration import SparseCalibrationWeights
# Setup: Q targets, N households
Q, N = 200, 10000
M = sp.random(Q, N, density=0.3, format="csr") # Household characteristics
y = np.random.uniform(1e6, 1e8, size=Q) # Population targets
# Initialize model
model = SparseCalibrationWeights(
n_features=N,
beta=0.35,
gamma=-0.1,
zeta=1.1,
init_keep_prob=0.5,
init_weights=1.0,
log_weight_jitter_sd=0.05,
device="cuda",
)
# Train with L0+L2 regularization
model.fit(
M=M,
y=y,
lambda_l0=1e-6,
lambda_l2=1e-8,
lr=0.15,
epochs=2000,
loss_type="relative",
verbose=True,
)
# Get results
active = model.get_active_weights()
print(f"Selected {active['count']} of {N} households")
print(f"Sparsity: {model.get_sparsity():.1%}")- Non-negative weights: Constrained via log-space parameterization
- L0 sparsity: Directly minimizes the count of active weights
- Relative loss: Scale-invariant for targets spanning orders of magnitude
- Group-wise averaging: Balance loss across target groups with different sizes
- GPU support: CUDA acceleration for large problems
For sparse linear regression with scipy sparse matrices:
from scipy import sparse as sp
from l0.sparse import SparseL0Linear
# Sparse design matrix
X = sp.random(1000, 500, density=0.1, format="csr")
y = np.random.randn(1000)
model = SparseL0Linear(n_features=500)
model.fit(X, y, lambda_l0=0.001, epochs=1000)
# Get sparse coefficients
coef = model.get_coefficients(threshold=0.01)The examples/sparse_regression_demo.py script demonstrates L0 regularization on a simple problem where the true coefficients are [1, 0, -2, 0]:
python examples/sparse_regression_demo.pyOutput:
True coefficients: [ 1. 0. -2. 0.]
Recovered coefficients: [ 1.039 0. -2.069 -0. ]
Gates: [1. 0. 1. 0.]
The model correctly identifies that only variables 1 and 3 contribute to the outcome.
pytest tests/ -v --cov=l0@article{louizos2017learning,
title={Learning Sparse Neural Networks through L0 Regularization},
author={Louizos, Christos and Welling, Max and Kingma, Diederik P},
journal={arXiv preprint arXiv:1712.01312},
year={2017}
}MIT License - see LICENSE for details.