chore(licenses): add dependency license policy + CI check#724
chore(licenses): add dependency license policy + CI check#724cbcoutinho wants to merge 1 commit into
Conversation
Adds an automated license compliance gate so future deps stay compatible with both AGPL-3 distribution and the planned commercial dual-licensed build (incl. SaaS offering, since AGPL §13 makes network use a commercial concern). - `.licenses/policy.toml` — allowlist of permissive licenses, per-package exceptions for AGPL/dual-licensed deps, and metadata overrides for packages whose wheel metadata mis-classifies the actual license. - `scripts/check-licenses.py` — walks every installed package via `pip-licenses`, classifies against policy, exits non-zero on policy violations, emits a Markdown report suitable for `$GITHUB_STEP_SUMMARY`. - `.github/workflows/license-check.yml` — runs the script on PRs that touch dependency manifests and uploads a JSON report artifact. - CONTRIBUTING.md — documents the workflow contributors must follow when adding/upgrading a dep. - pip-licenses added as a dev dependency. Initial run flags four packages for follow-up (tracked separately): * pymupdf-layout — proprietary commercial-only, unused → remove * PyMuPDF / pymupdf4llm — Artifex commercial license needed for non-AGPL * icalendar-searcher — pure AGPL, blocks any commercial channel
License compliance gate — reviewOverall this is a well-structured PR. The policy file is clearly documented, the SPDX composite-expression handling is non-trivial and mostly correct, and pinning all GitHub Actions by commit hash is a good security practice. A few things worth addressing before merging: Bug:
|
Summary
License checkworkflow) that runs on every PR touchingpyproject.toml/uv.lock. It fails fast on any new dep whose license is not on the allowlist or registered as an explicit per-package exception..licenses/policy.toml. Permissive licences (MIT/BSD/Apache-2.0/ISC/PSF/MPL-2.0/LGPL via dynamic linking) are auto-accepted; everything else needs a per-package exception withrationale,allowed_for, andreview_required_forfields.CONTRIBUTING.md.Initial audit (193 packages)
fastembed(actually Apache-2.0),matplotlib-inline(actually BSD-3),py_rust_stemmers(actually MIT).pymupdf-layout— Artifex proprietary commercial-only; not actually imported, candidate for removal.PyMuPDF+pymupdf4llm— AGPL-3 OR Artifex Commercial; need an Artifex agreement for non-AGPL distribution.icalendar-searcher— pure AGPL-3-or-later; blocks any commercial channel (incl. proprietary SaaS).Why SaaS matters here
AGPL §13 makes network use trigger source-disclosure obligations, so a third party offering SaaS based on this code is bound by AGPL unless they hold a separate commercial license from us. The same dependency constraints therefore apply to commercial SaaS as to embedded/proprietary distribution — both are part of the "commercial" channel. The policy file flags this distinction with
allowed_for = ["agpl"]vs["agpl", "commercial"].Test plan
uv run scripts/check-licenses.pypasses locally (0 failures, 4 reviewed exceptions, 3 metadata overrides).License checkworkflow runs green on this PR.This PR was generated with the help of AI, and reviewed by a Human