feat: dynamically create sf-hamilton-core package#1376
Conversation
| ### `pandas` and `numpy` dependencies | ||
| Hamilton was initially created for workflows that used `pandas` and `numpy` heavily. For this reason, `numpy` and `pandas` are imported at the top-level of module `hamilton.base`. Because of the package structure, as a Hamilton user, you're importing `pandas` and `numpy` every time you import `hamilton`. | ||
|
|
||
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory. |
There was a problem hiding this comment.
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory. | |
| A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and unsatisfactory. |
| from . import htypes, node | ||
| except ImportError: | ||
| import node | ||
| if TYPE_CHECKING: |
There was a problem hiding this comment.
comment as to importance of this
There was a problem hiding this comment.
Changes:
- I moved the imports
pandas,numpyandpandas.core.indexes.extensionfrom the top-level to the code path that actually use these dependencies. There should be no behavior change, but it allows toimport hamilton.basewithout loading pandas each time. - if TYPE_CHECKING is the standard Python approach to import package that are only relevant for annotating signatures. For example,
mypywill importpandaswhen doing type checking, but doingfrom hamilton import basewon't importpandas pandasandnumpyare in the type checking block because they are used in some function signatures.pandas.core.indexes.extensionis not because it isn't used in type annotations.- moved
hamilton.nodetoTYPE_CHECKINGblock since it's only used for annotations - moved
htypesto top-level import; it should not have been in the atry/exceptin the first place because a code path ofSimplePythonDataFrameGraphAdapterdepends on it and will fail error ifhtypesisn't imported
I don't have the "why" for this code:
try:
from . import htypes, node
except ImportError:
import node- The
try/exceptwas introduced in 2022, but no clear indications why. - Looking at the source code of the file at the time, it was probably a brute force solution to avoid circular imports.
- The code could have been in a
TYPE_CHECKINGblock (introduced in Python 3.5) since it was only ever used for annotations
There was a problem hiding this comment.
Yeah sorry I meant in the code leave a note/comment as to the importance :)
There was a problem hiding this comment.
E.g. this is for hamilton-core to work...
|
I don't know much about releases to pypi, can I ask if it possible to sign the releases using something like https://www.python.org/downloads/metadata/sigstore/ or something similar? If you sign the pypi release, could you sign it with the same key that you sign the ASF compliant source releases? The signing keys for source releases need to be maintained on secure hardware and generally, most ASF releases are done by running jobs on a private laptop and signed with GPG (or equivalent). |
|
@pjfanning I have little experience with pypi, but I want to highlight that we have multiple options for distribution:
Actually, you can even try I believe "hamilton-core" PR is a temporary solution to something we should fix on a major release. Users that care about this issue (already a few reacted positively on Slack) are probably ok with following a few extra steps for installation. |
|
@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release. |
Makes sense. Though, introducing |
I'm not too concerned. Tooling should make this simpler. Once we hit 2.0. we have options to stop I think. I'm not too worried -- and if we always set that expectation I think we'll be good. On that note, in the |
|
Hey guys, what is the plan for this PR? I would very much like to see a Hamilton version that doesn't require pandas/numpy. BTW, now that we have prek, perhaps this open up some possibilities related to hooks/CI. |
|
Some suggestions by Claude: Dependency & Structure Improvements for Hamilton1. Remove
|
typing_inspect function |
Replacement |
|---|---|
get_origin() / get_args() |
Already handled in hamilton/htypes.py via typing.get_origin / typing_extensions.get_origin |
is_optional_type(tp) |
get_origin(tp) is Union and type(None) in get_args(tp) |
is_union_type(tp) |
get_origin(tp) is Union |
is_generic_type(tp) |
get_origin(tp) is not None |
is_tuple_type(tp) |
get_origin(tp) is tuple |
is_typevar(tp) |
isinstance(tp, TypeVar) |
is_literal_type(tp) |
get_origin(tp) is Literal |
These are used across hamilton/htypes.py, hamilton/node.py, hamilton/function_modifiers/dependencies.py, hamilton/function_modifiers/expanders.py, hamilton/function_modifiers/adapters.py, and hamilton/experimental/h_cache.py. A small helper module (or additions to htypes.py) could centralize these ~5 one-liner functions and eliminate the dependency entirely.
Impact: Removes 1 of 4 required dependencies.
2. Make remaining top-level pandas/numpy imports lazy
base.py was already fixed, but 3 files still have top-level imports:
| File | Import | Fix |
|---|---|---|
hamilton/driver.py:48 |
import pandas as pd |
Move inside the functions that use it (it's only used in the __main__ block) |
hamilton/log_setup.py:21 |
import numpy as np |
Move inside setup_logging() — it's only used on line 47 for np.seterr() |
hamilton/models.py:21 |
import pandas as pd |
Move to TYPE_CHECKING block — only used for type annotations in predict() signature |
These are the remaining blockers for hamilton-core working cleanly without pandas/numpy installed. Without these fixes, import hamilton.driver or import hamilton.log_setup will crash in a sf-hamilton-core environment.
3. typing_extensions — keep for now
This dependency is minimal (used for Annotated, NotRequired, get_origin, get_args, is_typeddict — all backports for Python < 3.11). It's lightweight and can't be removed until the minimum Python version is bumped to 3.11+. The current constraint > 4.0.0 is correct.
Minor fix needed: hamilton/function_modifiers/expanders.py:762 imports typing_extensions.is_typeddict unconditionally — should add a version check like the other imports:
if sys.version_info >= (3, 10):
from typing import is_typeddict
else:
from typing_extensions import is_typeddict4. Improve the hamilton-core build with uv workspaces
The current hamilton-core/setup.py copies the entire hamilton/ source tree into hamilton/_hamilton/ at build time via shutil.copytree. This works but is fragile:
- Stale copies if setup.py isn't re-run
_hamilton/is.gitignored, so debugging the installed package is confusing- The proxy module in
hamilton-core/hamilton/__init__.pyadds runtime overhead and complexity
Better approach with uv workspaces:
# Root pyproject.toml — add workspace config
[tool.uv.workspace]
members = ["hamilton-core"]
# hamilton-core/pyproject.toml — replace setup.py entirely
[project]
name = "sf-hamilton-core"
dynamic = ["version"]
dependencies = [
"typing_extensions > 4.0.0",
# no pandas, numpy, or typing_inspect
]
[tool.uv.sources]
sf-hamilton = { workspace = true }This eliminates the copy step, the proxy module, and the dynamic setup.py entirely. The hamilton package is referenced directly from the workspace root.
5. Use prek workspace mode for pre-commit hooks
With prek now on main, you can use its workspace feature to manage hooks per-package:
# hamilton-core/.pre-commit-config.yaml
orphan: true # isolate from root hooks
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.1
hooks:
- id: ruff-check
- id: ruff-formatprek auto-discovers this config and scopes the hooks to hamilton-core/ files only. This keeps linting/formatting consistent while allowing each sub-package to customize.
6. Drop networkx from visualization extra
The current main pyproject.toml lists networkx as part of the visualization extra, but the hamilton-core setup.py already drops it:
**{"visualization": ["graphviz"]}, # drop networkxIf networkx is genuinely unused in the visualization code path, this change should be applied to the main pyproject.toml as well.
Summary — final dependency picture
sf-hamilton (current) |
sf-hamilton-core (current) |
After proposed changes | |
|---|---|---|---|
pandas |
required | removed | removed |
numpy |
required | removed | removed |
typing_inspect |
required | required | removed |
typing_extensions |
required | required | required (until Python 3.11 min) |
| Total core deps | 4 | 2 | 1 |
The single remaining hard dependency would be typing_extensions, and even that goes away once Python < 3.11 is dropped. At that point, sf-hamilton-core would have zero third-party dependencies.
Adds the directory
hamilton-core/with mechanism to dynamically patchhamilton/and bundle a library namedsf-hamilton-corewhich could be pushed to pypi.It makes targeted 2 changes:
pandasandnumpyoptional dependencies; and removenetworkxdependency (currently unused).This makes the Hamilton package a much lighter install and solves long library loading time. See the file
hamilton-core/README.mdfor detailsChanges
hamilton-core/directoryhamilton-core/setup.pycopies the code ofhamilton/tohamilton-core/hamilton/_hamiltonhamilton-core/hamilton/__init__.pyis the entry point tosf-hamilton-core, which proxies everything directly to the source code ofhamiltonstored inhamilton-core/hamilton/_hamiltonhamilton/base.pyto lazily importpandasandnumpy. This shouldn't affect users in any wayHow I tested this
sf-hamilton-coreand runs Hamilton's unit testsTODO
hamilton-core/setup.pyand add linting + formattingChecklist