-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix(models/ocr): use OS trust store for model downloads; route polars via lts-cpu wrapper #1175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # ============================================================================= | ||
| # MIT License | ||
| # Copyright (c) 2026 Aparavi Software AG | ||
| # ============================================================================= | ||
| """ | ||
| Polars wrapper that ensures polars-lts-cpu is the active install on x86_64. | ||
|
|
||
| The default `polars` PyPI wheel requires AVX2/FMA/BMI1/BMI2/etc. and crashes | ||
| (SEH 0xc000001d / SIGILL) on x86_64 hosts without those features. The | ||
| `polars-lts-cpu` wheel ships an AVX2-free binary under the same `polars` | ||
| import name. Same Python API; GPU acceleration in the engine flows through | ||
| PyTorch (ai.common.torch) and is independent of this choice. | ||
|
|
||
| The wrinkle: img2table and other libs declare `polars` as a hard dependency. | ||
| Pip/uv resolve them as separate distributions, both writing into the same | ||
| `polars/` namespace in site-packages. If the regular `polars` wheel ends up | ||
| authoritative for the compiled `_polars.pyd` / `_polars.abi3.so`, you crash; | ||
| if the .py files come from one version and the binary from another you get | ||
| ImportErrors like "cannot import name 'POLARS_STORAGE_CONFIG_KEYS'". | ||
|
|
||
| This module follows the same pattern as ai.common.opencv (which solves the | ||
| identical problem for cv2's four conflicting PyPI wheels): | ||
| 1. Install polars-lts-cpu via the requirements file. | ||
| 2. Uninstall any plain `polars` that came in as a transitive dep. | ||
| 3. Force-reinstall polars-lts-cpu so its files are unambiguously on disk. | ||
| 4. Reset any cached `polars` modules so the next import is clean. | ||
|
|
||
| ARM hosts (Linux aarch64, macOS arm64) don't need this — their default | ||
| `polars` wheel has no AVX requirement — so the cleanup is x86_64-only. | ||
|
|
||
| Usage: | ||
| from ai.common.polars import pl | ||
| df = pl.DataFrame(...) | ||
|
|
||
| Import this BEFORE any module that touches polars (img2table, deltalake, etc.) | ||
| so the right binary is in place when those modules load. | ||
| """ | ||
|
|
||
| import os | ||
| import platform | ||
| import sys | ||
|
|
||
| from depends import depends, pip | ||
|
|
||
| # polars-lts-cpu only matters on x86_64; ARM wheels don't ship AVX2 code paths. | ||
| _NEEDS_LTS = platform.machine().lower() in ('x86_64', 'amd64') | ||
|
|
||
| requirements = os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt' | ||
| depends(requirements) | ||
|
|
||
| if _NEEDS_LTS: | ||
| try: | ||
| import importlib.metadata as _md | ||
|
|
||
| _has_plain_polars = False | ||
| try: | ||
| _md.version('polars') | ||
| _has_plain_polars = True | ||
| except _md.PackageNotFoundError: | ||
| # Plain `polars` not installed — only polars-lts-cpu is on disk, | ||
| # which is exactly the desired state. No cleanup needed. | ||
| pass | ||
|
|
||
| if _has_plain_polars: | ||
| # Plain `polars` was pulled in transitively (img2table etc.). | ||
| # Drop it and force-reinstall lts-cpu so its binary wins on disk. | ||
| pip('uninstall', '-y', 'polars') | ||
| pip('install', '--force-reinstall', '--no-deps', 'polars-lts-cpu') | ||
|
|
||
| # Drop any already-loaded polars modules so the next import | ||
| # picks up the freshly-written files instead of cached state. | ||
| for _mod in [m for m in list(sys.modules) if m == 'polars' or m.startswith('polars.')]: | ||
| sys.modules.pop(_mod, None) | ||
| except Exception: | ||
| # Best-effort cleanup. If it fails, the import below will surface | ||
| # the underlying issue with a real traceback. | ||
| pass | ||
|
|
||
| import polars as pl # noqa: E402 | ||
|
|
||
| __all__ = ['pl'] | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| polars-lts-cpu; platform_machine == "x86_64" or platform_machine == "AMD64" |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,61 @@ | ||||||||||||||||||||||
| # ============================================================================= | ||||||||||||||||||||||
| # MIT License | ||||||||||||||||||||||
| # Copyright (c) 2026 Aparavi Software AG | ||||||||||||||||||||||
| # ============================================================================= | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
| SSL trust store integration. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Embedded Python on Windows ships a default SSL context that loads only a | ||||||||||||||||||||||
| narrow subset of the Windows ROOT store (often <30 CAs in practice on | ||||||||||||||||||||||
| locked-down corporate machines). That breaks any model loader that | ||||||||||||||||||||||
| downloads weights from a public CDN — TLS validation fails with | ||||||||||||||||||||||
| "unable to get local issuer certificate" because the CA that signed the | ||||||||||||||||||||||
| server's chain isn't in the loaded subset. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| This module installs `truststore` and patches Python's default SSL context | ||||||||||||||||||||||
| to use the OS trust store directly (SChannel on Windows, SecureTransport on | ||||||||||||||||||||||
| macOS, OpenSSL system roots on Linux). Effects: | ||||||||||||||||||||||
| - All public CAs in the OS store are trusted, not just the subset | ||||||||||||||||||||||
| `load_default_certs()` exposes. | ||||||||||||||||||||||
| - Corporate root CAs deployed via Group Policy / MDM are picked up | ||||||||||||||||||||||
| automatically — needed for any environment with TLS-intercepting proxies | ||||||||||||||||||||||
| (Zscaler, Netskope, BlueCoat, etc.). | ||||||||||||||||||||||
| - urllib, requests, httpx, and anything using a default SSL context all | ||||||||||||||||||||||
| benefit from the same patch — no per-callsite changes needed. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Usage: | ||||||||||||||||||||||
| import ai.common.ssl # noqa: F401 - patches default SSL context | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| Import this once, early, in any module that triggers HTTPS downloads. | ||||||||||||||||||||||
| The `ai.common.models` package imports it at the top of its __init__.py, | ||||||||||||||||||||||
| so any model loader is covered transitively. | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| If truststore can't be installed or injected (e.g. very old Python), this | ||||||||||||||||||||||
| module falls back to pointing OpenSSL at certifi's bundle — better than | ||||||||||||||||||||||
| the partial Windows store, but won't pick up corporate CAs. | ||||||||||||||||||||||
| """ | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| import os | ||||||||||||||||||||||
| from depends import depends | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| requirements = os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt' | ||||||||||||||||||||||
| depends(requirements) | ||||||||||||||||||||||
|
Comment on lines
+41
to
+42
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Move Lines 41-42 can raise before the fallback logic starts, so a failed Suggested fix-requirements = os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt'
-depends(requirements)
-
try:
+ requirements = os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt'
+ depends(requirements)
import truststore
truststore.inject_into_ssl()
except Exception:Based on learnings, import-time 📝 Committable suggestion
Suggested change
🤖 Prompt for AI AgentsSource: Learnings |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| try: | ||||||||||||||||||||||
| import truststore | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| truststore.inject_into_ssl() | ||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||
| # Fallback: point Python at certifi's CA bundle. Catches the "embedded | ||||||||||||||||||||||
| # Python's default trust store is too small" case but won't help with | ||||||||||||||||||||||
| # corporate TLS interception. Better than nothing. | ||||||||||||||||||||||
| try: | ||||||||||||||||||||||
| import certifi | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| os.environ.setdefault('SSL_CERT_FILE', certifi.where()) | ||||||||||||||||||||||
| os.environ.setdefault('REQUESTS_CA_BUNDLE', certifi.where()) | ||||||||||||||||||||||
| except Exception: | ||||||||||||||||||||||
| # Both truststore and certifi fallback failed. Leave the default SSL | ||||||||||||||||||||||
| # context untouched — downstream HTTPS calls will surface their own | ||||||||||||||||||||||
| # error with a real traceback if validation fails. | ||||||||||||||||||||||
| pass | ||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| truststore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fail fast when polars remediation does not succeed.
The x86_64 remediation path treats uninstall/reinstall as best-effort and swallows failures. Because
pip(...)returns a boolean, a failed cleanup can silently leave incompatiblepolarsactive, which reintroduces the crash class this module is meant to prevent.Suggested fix
if _NEEDS_LTS: try: import importlib.metadata as _md @@ if _has_plain_polars: @@ - pip('uninstall', '-y', 'polars') - pip('install', '--force-reinstall', '--no-deps', 'polars-lts-cpu') + uninstalled = pip('uninstall', '-y', 'polars') + installed = pip('install', '--force-reinstall', '--no-deps', 'polars-lts-cpu') + if not (uninstalled and installed): + raise RuntimeError('Failed to enforce polars-lts-cpu on x86_64 host') @@ - except Exception: - # Best-effort cleanup. If it fails, the import below will surface - # the underlying issue with a real traceback. - pass + except Exception as exc: + raise RuntimeError('Polars runtime remediation failed before import') from exc📝 Committable suggestion
🤖 Prompt for AI Agents