Skip to content

feat: dynamically create sf-hamilton-core package#1376

Open
zilto wants to merge 8 commits intomainfrom
feat/hamilton-core
Open

feat: dynamically create sf-hamilton-core package#1376
zilto wants to merge 8 commits intomainfrom
feat/hamilton-core

Conversation

@zilto
Copy link
Contributor

@zilto zilto commented Sep 6, 2025

Adds the directory hamilton-core/ with mechanism to dynamically patch hamilton/ and bundle a library named sf-hamilton-core which could be pushed to pypi.

It makes targeted 2 changes:

  • disable plugin autoloading
  • make pandas and numpy optional dependencies; and remove networkx dependency (currently unused).

This makes the Hamilton package a much lighter install and solves long library loading time. See the file hamilton-core/README.md for details

Changes

  • Add hamilton-core/ directory
  • hamilton-core/setup.py copies the code of hamilton/ to hamilton-core/hamilton/_hamilton
  • hamilton-core/hamilton/__init__.py is the entry point to sf-hamilton-core, which proxies everything directly to the source code of hamilton stored in hamilton-core/hamilton/_hamilton
  • modify hamilton/base.py to lazily import pandas and numpy. This shouldn't affect users in any way

How I tested this

  • successfully ran all core unit tests locally
  • added CI workflow that installs sf-hamilton-core and runs Hamilton's unit tests

TODO

  • remove networkx dependency
  • add info to README about how it works
  • clean up hamilton-core/setup.py and add linting + formatting
  • potentially add docs pages

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • Project documentation has been updated if adding/changing functionality.

@zilto zilto self-assigned this Sep 6, 2025
@zilto zilto added the core-work Work that is "core". Likely overseen by core team in most cases. label Sep 6, 2025
@zilto zilto requested review from elijahbenizzy and skrawcz and removed request for elijahbenizzy September 6, 2025 01:24
Copy link
Contributor

@skrawcz skrawcz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does this copy everything in sf-hamilton? Including the LICENSE file and NOTICE files? We need a few things to always be there for Apache purposes. If so, we're good I think. If not you'll need to add them.

### `pandas` and `numpy` dependencies
Hamilton was initially created for workflows that used `pandas` and `numpy` heavily. For this reason, `numpy` and `pandas` are imported at the top-level of module `hamilton.base`. Because of the package structure, as a Hamilton user, you're importing `pandas` and `numpy` every time you import `hamilton`.

A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and insatisfactory.
A reasonable change would be to move `numpy` and `pandas` to a "lazy" location. Then, dependencies would only be imported when features requiring them are used and they could be removed from `pyproject.toml`. Unfortunately, plugin autoloading defaults make this solution a significant breaking change and unsatisfactory.

from . import htypes, node
except ImportError:
import node
if TYPE_CHECKING:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment as to importance of this

Copy link
Contributor Author

@zilto zilto Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes:

  • I moved the imports pandas, numpy and pandas.core.indexes.extension from the top-level to the code path that actually use these dependencies. There should be no behavior change, but it allows to import hamilton.base without loading pandas each time.
  • if TYPE_CHECKING is the standard Python approach to import package that are only relevant for annotating signatures. For example, mypy will import pandas when doing type checking, but doing from hamilton import base won't import pandas
  • pandas and numpy are in the type checking block because they are used in some function signatures. pandas.core.indexes.extension is not because it isn't used in type annotations.
  • moved hamilton.node to TYPE_CHECKING block since it's only used for annotations
  • moved htypes to top-level import; it should not have been in the a try/except in the first place because a code path of SimplePythonDataFrameGraphAdapter depends on it and will fail error if htypes isn't imported

I don't have the "why" for this code:

try:
  from . import htypes, node
except ImportError:
  import node
  • The try/except was introduced in 2022, but no clear indications why.
  • Looking at the source code of the file at the time, it was probably a brute force solution to avoid circular imports.
  • The code could have been in a TYPE_CHECKING block (introduced in Python 3.5) since it was only ever used for annotations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry I meant in the code leave a note/comment as to the importance :)

Copy link
Contributor

@skrawcz skrawcz Sep 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. this is for hamilton-core to work...

@pjfanning
Copy link
Member

pjfanning commented Sep 6, 2025

I don't know much about releases to pypi, can I ask if it possible to sign the releases using something like https://www.python.org/downloads/metadata/sigstore/ or something similar?

If you sign the pypi release, could you sign it with the same key that you sign the ASF compliant source releases?

The signing keys for source releases need to be maintained on secure hardware and generally, most ASF releases are done by running jobs on a private laptop and signed with GPG (or equivalent).
https://infra.apache.org/release-signing.html

@zilto
Copy link
Contributor Author

zilto commented Sep 7, 2025

@pjfanning I have little experience with pypi, but I want to highlight that we have multiple options for distribution:

  • pip allows you to install from github pip install git+https://github.com/apache/hamilton.git
  • @ allows to specify tag or brancehs. We could have pip install git+https://github.com/apache/hamilton.git@core
  • you can also specify subpaths. We could have pip install git+https://github.com/apache/hamilton.git#subdirectory=hamilton-core
  • the GitHub release workflow could include core versions in the release Assets
  • These install instructions are valid in requirements.txt or pyproject.toml

Actually, you can even try hamilton-core right now from this PR via

pip install git+https://github.com/apache/hamilton.git@feat/hamilton-core#subdirectory=hamilton-core

I believe "hamilton-core" PR is a temporary solution to something we should fix on a major release. Users that care about this issue (already a few reacted positively on Slack) are probably ok with following a few extra steps for installation.

@pjfanning
Copy link
Member

@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release.

@zilto
Copy link
Contributor Author

zilto commented Sep 7, 2025

@zilto the ASF frowns on encouraging users to use latest code in git. We aim to do official releases and have reviews and votes to improve the confidence about the release being stable. I think the pypi releases should be done with the source release and be based on the exact git commit that was accepted for the release.

Makes sense. Though, introducing sf-hamilton-core and having people depend on it means we'll have to maintain that pypi target "forever". If the fixes included in sf-hamilton-core are solved in Hamilton 2.0.0, doing pip install sf-hamilton or pip install sf-hamilton-core would do exactly the same thing, which could be confusing

@skrawcz
Copy link
Contributor

skrawcz commented Sep 8, 2025

we'll have to maintain that pypi target "forever"

I'm not too concerned. Tooling should make this simpler. Once we hit 2.0. we have options to stop I think. I'm not too worried -- and if we always set that expectation I think we'll be good.

On that note, in the __init__.py can you log a warning that sf-hamilton-core will go away in 2.0?

@Dev-iL
Copy link
Collaborator

Dev-iL commented Feb 16, 2026

Hey guys, what is the plan for this PR? I would very much like to see a Hamilton version that doesn't require pandas/numpy. BTW, now that we have prek, perhaps this open up some possibilities related to hooks/CI.

@Dev-iL
Copy link
Collaborator

Dev-iL commented Feb 16, 2026

Some suggestions by Claude:

Dependency & Structure Improvements for Hamilton

1. Remove typing_inspect dependency

typing_inspect is used in 6 files but all its functions have stdlib equivalents (Python 3.8+):

typing_inspect function Replacement
get_origin() / get_args() Already handled in hamilton/htypes.py via typing.get_origin / typing_extensions.get_origin
is_optional_type(tp) get_origin(tp) is Union and type(None) in get_args(tp)
is_union_type(tp) get_origin(tp) is Union
is_generic_type(tp) get_origin(tp) is not None
is_tuple_type(tp) get_origin(tp) is tuple
is_typevar(tp) isinstance(tp, TypeVar)
is_literal_type(tp) get_origin(tp) is Literal

These are used across hamilton/htypes.py, hamilton/node.py, hamilton/function_modifiers/dependencies.py, hamilton/function_modifiers/expanders.py, hamilton/function_modifiers/adapters.py, and hamilton/experimental/h_cache.py. A small helper module (or additions to htypes.py) could centralize these ~5 one-liner functions and eliminate the dependency entirely.

Impact: Removes 1 of 4 required dependencies.


2. Make remaining top-level pandas/numpy imports lazy

base.py was already fixed, but 3 files still have top-level imports:

File Import Fix
hamilton/driver.py:48 import pandas as pd Move inside the functions that use it (it's only used in the __main__ block)
hamilton/log_setup.py:21 import numpy as np Move inside setup_logging() — it's only used on line 47 for np.seterr()
hamilton/models.py:21 import pandas as pd Move to TYPE_CHECKING block — only used for type annotations in predict() signature

These are the remaining blockers for hamilton-core working cleanly without pandas/numpy installed. Without these fixes, import hamilton.driver or import hamilton.log_setup will crash in a sf-hamilton-core environment.


3. typing_extensions — keep for now

This dependency is minimal (used for Annotated, NotRequired, get_origin, get_args, is_typeddict — all backports for Python < 3.11). It's lightweight and can't be removed until the minimum Python version is bumped to 3.11+. The current constraint > 4.0.0 is correct.

Minor fix needed: hamilton/function_modifiers/expanders.py:762 imports typing_extensions.is_typeddict unconditionally — should add a version check like the other imports:

if sys.version_info >= (3, 10):
    from typing import is_typeddict
else:
    from typing_extensions import is_typeddict

4. Improve the hamilton-core build with uv workspaces

The current hamilton-core/setup.py copies the entire hamilton/ source tree into hamilton/_hamilton/ at build time via shutil.copytree. This works but is fragile:

  • Stale copies if setup.py isn't re-run
  • _hamilton/ is .gitignored, so debugging the installed package is confusing
  • The proxy module in hamilton-core/hamilton/__init__.py adds runtime overhead and complexity

Better approach with uv workspaces:

# Root pyproject.toml — add workspace config
[tool.uv.workspace]
members = ["hamilton-core"]

# hamilton-core/pyproject.toml — replace setup.py entirely
[project]
name = "sf-hamilton-core"
dynamic = ["version"]
dependencies = [
    "typing_extensions > 4.0.0",
    # no pandas, numpy, or typing_inspect
]

[tool.uv.sources]
sf-hamilton = { workspace = true }

This eliminates the copy step, the proxy module, and the dynamic setup.py entirely. The hamilton package is referenced directly from the workspace root.


5. Use prek workspace mode for pre-commit hooks

With prek now on main, you can use its workspace feature to manage hooks per-package:

# hamilton-core/.pre-commit-config.yaml
orphan: true  # isolate from root hooks
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.15.1
    hooks:
      - id: ruff-check
      - id: ruff-format

prek auto-discovers this config and scopes the hooks to hamilton-core/ files only. This keeps linting/formatting consistent while allowing each sub-package to customize.


6. Drop networkx from visualization extra

The current main pyproject.toml lists networkx as part of the visualization extra, but the hamilton-core setup.py already drops it:

**{"visualization": ["graphviz"]},  # drop networkx

If networkx is genuinely unused in the visualization code path, this change should be applied to the main pyproject.toml as well.


Summary — final dependency picture

sf-hamilton (current) sf-hamilton-core (current) After proposed changes
pandas required removed removed
numpy required removed removed
typing_inspect required required removed
typing_extensions required required required (until Python 3.11 min)
Total core deps 4 2 1

The single remaining hard dependency would be typing_extensions, and even that goes away once Python < 3.11 is dropped. At that point, sf-hamilton-core would have zero third-party dependencies.

@skrawcz
Copy link
Contributor

skrawcz commented Feb 21, 2026

@Dev-iL those are good points. @zilto what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-work Work that is "core". Likely overseen by core team in most cases.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants