Skip to content

Conversation

@geetu040
Copy link
Collaborator

Towards #1575

This PR sets up the core folder and file structure along with base scaffolding for the API v1 → v2 migration.

It includes:

  • Skeleton for the HTTP client, backend, and API context
  • Abstract resource interfaces and versioned stubs (*V1, *V2)
  • Minimal wiring to allow future version switching and fallback support

No functional endpoints are migrated yet. This PR establishes a stable foundation for subsequent migration and refactor work.

@geetu040 geetu040 mentioned this pull request Dec 30, 2025
25 tasks
@codecov-commenter
Copy link

codecov-commenter commented Dec 31, 2025

Codecov Report

❌ Patch coverage is 55.27859% with 305 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.10%. Comparing base (d421b9e) to head (bfb2d3e).

Files with missing lines Patch % Lines
openml/_api/clients/http.py 24.46% 142 Missing ⚠️
openml/_api/resources/base/versions.py 24.71% 67 Missing ⚠️
openml/_api/resources/base/fallback.py 26.31% 28 Missing ⚠️
openml/_api/setup/backend.py 62.50% 24 Missing ⚠️
openml/testing.py 51.11% 22 Missing ⚠️
openml/_api/setup/_utils.py 56.00% 11 Missing ⚠️
openml/_api/setup/builder.py 75.86% 7 Missing ⚠️
openml/_api/resources/base/base.py 78.57% 3 Missing ⚠️
openml/config.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1576      +/-   ##
==========================================
+ Coverage   52.04%   53.10%   +1.05%     
==========================================
  Files          36       63      +27     
  Lines        4333     5015     +682     
==========================================
+ Hits         2255     2663     +408     
- Misses       2078     2352     +274     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cache: CacheConfig


settings = Settings(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the settings to the individual classes. I think this design introduces too high coupling of the classes to this file. You cannot move the classes around, or add a new API version without making non-extensible changes to this file here - because APISettings will require a constructor change and new classes it accepts.

Instead, a better design is to apply the strategy pattern cleanly to the different API definitions - v1 and v2 - and move the config either to their __init__, or a set_config (or similar) method.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated the Config class, which can handle multiple api-versions in future. Constructor change can still be expected not for api-versions but for new config values, let me know if that is still problematic

Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really great, I have a design suggestion related to the configs.

The config.py file and the coupling on it breaks an otherwise nice strategy pattern.

I recommend to follow the strategy pattern cleanly instead, and move the configs into the class instances, see above.

This will make the backend API much more extensible and cohesive.

key="...",
),
v2=APIConfig(
server="http://127.0.0.1:8001/",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be hardcoded? I guess this is just for your local development

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is hard-coded, they are the default values though the local endpoints will be replaced by remote server when deployed hopefully before merging this in main

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, currently I suppose this is the central place for putting the default values that are api related, do you think it's fine to keep it this way?

@geetu040 geetu040 changed the title [ENH] Migration: set up core/base structure [ENH] V1 → V2 API Migration - core structure Jan 9, 2026
@geetu040 geetu040 marked this pull request as draft January 12, 2026 18:47
Copy link
Collaborator

@PGijsbers PGijsbers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I agree with the suggested changes. This seems like a reasonable way to provide a unified interface for two different backends, and also separate out some concerns that were previously coupled or scattered more than they should (e.g., caching, configurations).

My main concern is with the change to caching behavior. I have a minor concern over the indirection APIContext introduces (perhaps I misunderstood its purpose), and the introduction of allowing Response return values.

In my comments you will also find some things that may already have been "wrong" in the old implementation. In that case, I think it simply makes sense to make the change now so I repeat it here for convenience.

from openml._api.config import APIConfig


class CacheMixin:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ttl should probably heavily depend on the path. If we do end up using caching at this level, we should use the Cache-Control HTTP Header response so the server can inform us how long to keep it in cache for (something that, I believe, neither servers do right now). A dataset search query can change if any dataset description changes (to either be now included or excluded), so caching probably shouldn't even be on by default for such type of queries. Dataset descriptions might change, but likely not very frequently. Dataset data files or computed qualities should (almost?) never change. This is the reason that the current implementation only caches description, features, qualities, and the dataset itself.

With this implementation, you also introduce some new issues:

  • What if the paths change, or even the query parameters? there is now dead cache. Do we now add cache cleanup routines? How does openml-python know what is no longer valid if they were responses with high TTL?
  • URLs may be (much) longer than the default max path of Windows (260 characters). If I'm not mistaken, this will lead to an issue unless you specifically work around it.
  • More of an implementation detail, but authenticated and unauthenticated requests are not differentiated. If a user accidentally makes an unauthenticated request, gets an error, and then authenticates they would still get an error.

@dataclass
class CacheConfig:
dir: str = "~/.openml/cache"
ttl: int = 60 * 60 * 24 * 7 # one week
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Considering the TTL of the HTTP standard is already defined in seconds, maybe it is fine to exclude it in the variable name? Though as noted above there is a discussion to be had about having this as a cache level property in the first place.
For future reference, setting the value to timedelta(weeks=1).total_seconds() is preferred over the arithmetic+comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding timedelta, I have updated the code to use it

For adding the suffix of seconds, I would say timeout is also by requests standard interpreted as seconds, so why should we add the suffix there either, for reference: #1576 (comment)

For having ttl at cache level, I'll answer under your previous comment: #1576 (comment)

JATAYU000 added a commit to JATAYU000/openml-python that referenced this pull request Feb 2, 2026
JATAYU000 added a commit to JATAYU000/openml-python that referenced this pull request Feb 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants