Skip to content

Apply pulsar capability snapshots to BYOC destinations#29

Open
mvdbeek wants to merge 757 commits into
pulsar-byocfrom
pulsar-byoc-capabilities
Open

Apply pulsar capability snapshots to BYOC destinations#29
mvdbeek wants to merge 757 commits into
pulsar-byocfrom
pulsar-byoc-capabilities

Conversation

@mvdbeek

@mvdbeek mvdbeek commented May 14, 2026

Copy link
Copy Markdown
Owner

Summary

When PulsarMQBYOCJobRunner builds a job for a BYOC pulsar destination, it now fetches the remote pulsar's capability snapshot from the relay (cached in-memory, 60s TTL) and adjusts the destination params to reflect what the remote can actually do:

  • jobs_directory auto-fill from the snapshot's staging_directory when unset (or set to the PARAMETER_SPECIFICATION_REQUIRED sentinel). With BYOC there is no shared FS between Galaxy and pulsar, so these must agree for path rewrites to work — operator-supplied mismatches log a loud warning.
  • Container-runtime clear-only: clears docker_enabled / singularity_enabled / apptainer_enabled individually if the binary isn't on the remote's PATH; clears remote_container_handling if no container runtime at all.
  • Conda dependency-resolution downgrade: demotes dependency_resolution=remote to 'none' (NOT 'local') when the remote reports no conda. 'local' would just shift the failure mode without fixing it because BYOC has no shared FS.

Downgrades are clear-only for boolean features (never auto-enables what TPV didn't ask for); jobs_directory is the sole auto-fill since it's pure data-supply.

Why

Pulsar publishes a static capability snapshot once on startup to a per-pulsar relay topic (pulsar-side commit shipped on master). Galaxy currently dispatches BYOC jobs blindly — if the operator's TPV rule sets docker_enabled=true but the remote pulsar has no docker installed, jobs fail at runtime with an opaque error. This PR turns that into a clean warning + graceful fallback.

Architecture

  • RelayCapabilitiesCache (in-memory TTL, no DB) keyed by (relay_url, manager_name).
  • PulsarByocManager.capabilities_for(resource, *, user) — wraps the cache, doing a refresh-token exchange + HttpRelayClient.fetch_messages only on cache miss; rotated tokens are vaulted back to the same key the runner reads from.
  • PulsarMQBYOCJobRunner._apply_capability_downgrades(params, user) — called from get_client_from_wrapper before super, mutates destination_params in place. The static __remote_container_handling / __dependency_resolution helpers then read the downgraded values when __prepare_job runs.

capabilities_for returning None (older pulsar version, network glitch, schema mismatch, missing vault token) is treated as "trust operator params verbatim" — capabilities are advisory.

Dependencies

Blocked on pulsar-relay-client 0.2.2 release (galaxyproject/pulsar-relay#10) which adds HttpRelayClient.fetch_messages. The Galaxy pyproject.toml pin (pulsar-relay-client>=1.0,<2) is a pre-existing placeholder unrelated to this PR; it should be tightened to >=0.2.2 whenever the BYOC branch fixes its pin to match reality.

Test plan

  • test/unit/app/managers/test_pulsar_capabilities_cache.py (new, 19 tests): cache TTL semantics, payload validator, schema-version filter, topic-name convention.
  • test/unit/app/managers/test_PulsarByocManager.py (extended, +7 tests): capabilities_for happy path, anonymous, no-vault, empty topic, unknown schema, TTL caching, refresh-token rotation persistence.
  • test/unit/app/jobs/test_pulsar_byoc_runner.py (extended, +18 tests): _apply_capability_downgrades covering every container-runtime path, dependency-resolution paths, jobs_directory auto-fill / mismatch / sentinel cases, no-op on absent resource id / null snapshot.
  • All 80 tests in the four touched files pass locally.
  • CI green once pulsar-relay-client>=0.2.2 is published.

Optional follow-up (not in this PR): extend test/integration/pulsar_byoc/test_byoc_tool_execution.py to exercise the path-rewrite-via-jobs_directory end-to-end against a real pulsar.

@mvdbeek mvdbeek force-pushed the pulsar-byoc-capabilities branch 3 times, most recently from c7c9e22 to b004283 Compare May 15, 2026 13:47
davelopez and others added 27 commits May 20, 2026 18:37
Improves clarity and detail for quota availability changes by
replacing a generic dictionary with a structured list of transfers.
This provides explicit source and target object store IDs for each
quota impact, making the information more precise for the user
interface. It enhances the accuracy and reviewability of storage
operation previews.
Clarifies user understanding of the implications when datasets
move from private to shareable storage. The updated message is
more informative about the potential for sharing after the
operation, making the warning less ambiguous.
Creates a dedicated Vue component to display the storage operation
preview. This component centralizes the complex rendering logic,
improving maintainability and allowing for a clearer separation
of concerns within the storage operation wizard.
Simplifies the storage operation wizard by offloading preview
rendering to a dedicated component. This reduces the modal's
complexity, removes duplicated logic, and improves the overall
user experience with clearer instructions and titles.
Prevents dataset transfer attempts when the source data is unavailable
by detecting missing files beforehand and recording a specific failure.
Improves error reporting and robustness against purged or absent datasets.

Co-authored-by: Copilot <copilot@github.com>
Introduces new helper methods to support preview, execute,
and monitor bulk storage operations within test populators.
Introduces comprehensive integration tests covering preview, execution, and run status for bulk storage operations across distributed object store backends. Validates eligibility, error handling, warnings, run lifecycle, collection support, and edge cases including quota transfers and privacy warnings.
@mvdbeek mvdbeek force-pushed the pulsar-byoc-capabilities branch 2 times, most recently from f954a15 to 2c5533d Compare May 27, 2026 10:27
@jmchilton

Copy link
Copy Markdown
Collaborator

Needs a rebase.

mvdbeek added 26 commits May 28, 2026 20:49
Two new tables backing user-self-registered Pulsar compute resources
("Bring Your Own Compute"):

* pulsar_byoc_resource — one row per registered BYOC, holding the relay
  endpoint, manager_name (= relay user sub claim), and lifecycle status
  (pending/active/disabled/deleted). Refresh token lives in the Galaxy
  vault under pulsar_byoc/<id>/relay_refresh_token, never on the row.
* pulsar_byoc_bootstrap_token — short-TTL single-use ticket authorising
  the POST /api/pulsar_byoc/bootstrap callback. Synthetic int id PK +
  unique-indexed token column (Galaxy convention).

Migration fe9dd3972993 chains from f5a73c8b9d12 and uses the
galaxy.model.migrations.util.create_table / drop_table helpers.
* enable_pulsar_byoc (bool, default false): gates the /api/pulsar_byoc
  endpoints and the multi-tenant pulsar_byoc job runner.
* pulsar_byoc_relay_url (str, optional): operator-configured relay URL
  surfaced in the registration response and embedded in the one-liner
  the user pastes into ``pulsar-config register-with-galaxy``.
PulsarByocManager (lib/galaxy/managers/pulsar_byoc.py) owns the BYOC
domain logic — bootstrap-token lifecycle, JWT sub-claim verification,
DB state machine, vault writes, rate limiting (5 mints/hour rolling
window). Wired onto UniverseApplication as ``app.byoc_manager`` so TPV
rules can call ``app.byoc_manager.get_active_for(user)`` at job
dispatch time.

The HttpRelayClient adapter (lib/galaxy/managers/pulsar_byoc_relay.py)
isolates all relay HTTP traffic — refresh-token exchange, /auth/me, and
the three-topic ACL-pinning dance with create-or-verify-ownership on
conflict. Defined as a RelayClient Protocol + a default factory so the
manager can be unit-tested against an in-memory fake without monkey-
patching ``requests``.

Manager methods raise domain exceptions (BootstrapTokenInvalid,
BootstrapTokenExpired, RelayVerificationFailed, RegistrationRateLimited,
ResourceHasRunningJobs) translated to HTTP at the service boundary in a
follow-up commit.

Topic-pinning tests are contract tests of HttpRelayClient against
``responses``-stubbed relay endpoints — replacing the previous
suite that monkey-patched ``requests.{get,post}`` on a manager instance
built via ``object.__new__``.
One statically-registered runner serves every BYOC resource. At startup
it holds no relay credentials and no client manager; those are
materialised lazily — keyed by ``(relay_url, manager_name)`` and guarded
by an RLock — when a job's destination params point at a user's BYOC
resource. The lazy path reads the per-user relay refresh token from the
vault, builds a Pulsar ClientManager via an injected factory (defaults
to ``pulsar.client.build_client_manager``), and wires a rotation
callback so refreshed tokens persist back into the vault. Recovery
fails any in-flight job cleanly if its BYOC row has been deleted while
the job was running.

Renames the base class's ``__async_update`` to ``_async_update`` so the
subclass can reference it without name-mangling — no behaviour change
for existing runners.

Tests inject a recording FakeClientManagerFactory and verify state on
the resulting fakes (status-callback wiring, shutdown count) instead of
asserting on ``MagicMock.assert_called_once`` interactions.
Six endpoints under /api/pulsar_byoc, all gated behind a
``Depends(_require_enabled)`` dependency that 404s when
``enable_pulsar_byoc`` is off:

  POST   /api/pulsar_byoc/registration       — mint bootstrap ticket
  POST   /api/pulsar_byoc/bootstrap          — host-side callback
  GET    /api/pulsar_byoc                    — list user's resources
  GET    /api/pulsar_byoc/{id}               — resource detail
  DELETE /api/pulsar_byoc/{id}               — soft delete (disable)
  POST   /api/pulsar_byoc/{id}/purge         — hard delete (drain + vault)

PulsarByocService owns the API-shaped glue: model→Pydantic mapping, the
``one_liner`` command-string formatting, config reads, and domain-
exception → HTTPException translation. The controller is pure
delegation. Pydantic schemas in lib/galaxy/schema/pulsar_byoc.py.

Routes are auto-registered via ``include_all_package_routers`` in
fast_app.py — no buildapp.py wiring needed.
A single admin-shipped TPV destination ``pulsar_byoc`` whose ``params``
are filled in at job dispatch time by a rule consulting
``app.byoc_manager.get_active_for(user)``. No per-BYOC YAML
regeneration is required — the rule's f-string expressions resolve
against the live manager when a job is mapped.

The unit test exercises the full TPV→manager→rule path with a stub
manager: a user with an active resource routes to ``pulsar_byoc`` with
the correct params injected; users without one fall back to the
default destination.
Brings up Keycloak (compose) + pulsar-relay (subprocess from local
checkout) + Pulsar (subprocess from local checkout) + Galaxy
(in-process via IntegrationTestCase). Skips cleanly when Docker isn't
reachable or the pulsar-relay / pulsar source trees can't be located —
override the search path with ``PULSAR_RELAY_REPO`` / ``PULSAR_REPO``.

Two test files under the e2e marker:

* test_byoc_e2e.py drives the full device-flow with pair=true against
  Keycloak, exercises HttpRelayClient.pin_topics_for_manager against
  the live relay, and verifies the two refresh tokens rotate on
  independent chains. Includes a cross-user defence test confirming
  the bootstrap admin can't seize a BYOC user's topics.

* test_byoc_tool_execution.py is the heavyweight sibling: spawns a
  real Pulsar daemon against the relay using the device-flow primary
  refresh token, then submits a framework tool through Galaxy via TPV
  → pulsar_byoc → relay → Pulsar and asserts the tool completes ok
  with destination_params carrying the right resource id + manager
  name.

RUNBOOK.md documents how a Docker-enabled session can run the full
suite, what passes look like, and how to triage failures. README.md
covers the simpler "what's in here" question.

The ``e2e`` pytest marker is added to pytest.ini.
Adds RelayCapabilitiesCache + PulsarByocManager.capabilities_for, and
hooks _apply_capability_downgrades into PulsarMQBYOCJobRunner so that
when Galaxy builds a job for a BYOC pulsar destination it fetches the
remote's capability snapshot from the relay (cached in-memory, 60s TTL)
and adjusts the destination params to match what the remote can
actually do:

- Auto-fills jobs_directory from the snapshot's staging_directory when
  unset (or set to the destination_default sentinel). With BYOC there
  is no shared FS between Galaxy and pulsar, so these MUST agree —
  warns loudly on operator-supplied mismatches.
- Clears docker_enabled / singularity_enabled / apptainer_enabled
  individually if the binary isn't on the remote's PATH.
- Clears remote_container_handling if no container runtime is present.
- Demotes dependency_resolution=remote to 'none' (NOT 'local') if the
  remote reports no conda — 'local' is meaningless without a shared FS.

Downgrades are clear-only for boolean features; jobs_directory is the
sole auto-fill (it's pure data-supply, not a downgrade).

Pulsar publishes the snapshot once on startup to a per-pulsar relay
topic (pulsar-side commit cae2241+006958a+2db26b7+c822898 on master);
this PR consumes it through HttpRelayClient.fetch_messages added in
pulsar-relay-client 0.2.2.

Blocked on pulsar-relay-client 0.2.2 release
(galaxyproject/pulsar-relay#10) for the SDK pin bump.
The previous pin matched no published version (PyPI has 0.2.x, not
1.x), so every Galaxy CI run on this branch failed at install time
with ModuleNotFoundError on pulsar_relay_client. Replaces with a pin
that resolves to a real release — 0.2.2 because that's the version
adding HttpRelayClient.fetch_messages, which this PR depends on
(galaxyproject/pulsar-relay#10).

CI on this branch will stay red until 0.2.2 publishes to PyPI; after
that it should go green without further changes here.
CI was still failing with ModuleNotFoundError on pulsar_relay_client
because Galaxy's CI installs from pinned-requirements.txt (the lockfile),
not from pyproject.toml directly — and the BYOC base branch never added
the pin to the lock. Add pulsar-relay-client==0.2.2 (now published)
right next to pulsar-galaxy-lib so it gets installed in the test env.

Also fixes ruff UP035: ``Callable`` should come from collections.abc,
not typing — Galaxy's ruff config flags this even on 3.10+ where the
typing version is still valid.
Runs the three Galaxy CI gates locally and addresses each:

- ``tox -e mypy``:
  * test_pulsar_capabilities_cache: loosen ``_msg`` payload to ``Any``
    (the test deliberately passes a non-dict to verify the validator
    rejects it); guard ``out["v"]`` access with a None check.
  * test_pulsar_byoc_runner: explicitly type the ``params`` dicts and
    the ``**kwargs`` snapshot-builder unpacks as ``dict[str, Any]`` so
    mypy can satisfy the heterogeneous value types.
  * runners/pulsar.py: ``# type: ignore[attr-defined]`` on
    ``self.app.byoc_manager`` — the runner is typed against the
    broader ``GalaxyManagerApplication`` but the BYOC manager is set
    on ``UniverseApplication``. The runner only ever dispatches
    inside a UniverseApplication so the attribute is present.
  * Drive-by: rename a shadowed ``client`` variable in the existing
    ``test_byoc_e2e.py`` to ``http_client`` so the with-block doesn't
    rebind the outer ``HttpRelayClient`` name to ``httpx.Client``.

- ``tox -e lint``: already clean after the earlier ruff Callable fix.

- ``make update-client-api-schema``: the base BYOC PR added
  ``/api/pulsar_byoc/...`` endpoints to FastAPI but never regenerated
  ``client/src/api/schema/schema.ts``. Regenerated via
  ``openapi-typescript`` + Galaxy's prettier config; the diff is +453
  lines, all of which describe the BYOC endpoints.
Galaxy's CI runs format as a separate gate from lint. Apply isort and
black to bring the new BYOC capability files (and a few pre-existing
BYOC files isort had been quietly unhappy with) up to the project's
formatting standard.
The packages/app pinned-requirements.txt is a symlink back to
lib/galaxy/dependencies/, so the lib-side pin already covers it. But
when packages are tested in isolation (Test Galaxy packages), pip
resolves deps from setup.cfg's install_requires — and that listed only
pulsar-galaxy-lib, not pulsar-relay-client. Add it next to the existing
pulsar dep so the isolated package install pulls in the relay client.
The Test Galaxy packages CI shard installs galaxy-app in isolation
(only the install_requires from setup.cfg). tpv is a dev/test dep, not
a runtime dep, so the package-isolated install doesn't have it — and
the existing test_pulsar_byoc_tpv_integration imports tpv at module
level, which crashes pytest collection.

importorskip skips the whole module when tpv is missing, restoring
collection without changing behavior in environments where tpv IS
installed (lib/dev install).
The runner block had no uncommented fields, so YAML parsed
runners.pulsar_byoc as None — failing schema validation
('Value None is not a dict at /runners/pulsar_byoc') and breaking
13 packages-isolation tests in test_job_configuration.py.
…sing

The 2 BYOC e2e setup errors weren't 'docker not running' — Docker is
available on CI. The actual error in the relay_against_keycloak fixture
was ModuleNotFoundError on pulsar_relay because the *server* package
(distinct from the pulsar-relay-client SDK already pinned) wasn't
installed. The fixture launches it under uvicorn:
    python -m uvicorn pulsar_relay.main:app

Add pulsar-relay>=0.2.0 to the test dependency-group so CI installs
both halves of the relay (server + client) and the e2e suite can boot
the server in-process.

Also add a module-level pytest.skip in the BYOC conftest gated on
importlib.util.find_spec('pulsar_relay') — for local dev runs where
a tester might not have the server package installed, the suite now
skips cleanly instead of erroring at fixture setup.
…service / API / tests

Renames the user-facing surface from "Pulsar BYOC" (an implementation
detail and project codename) to "compute resources" — the industry-standard
term used by Cromwell, Nextflow, Kubernetes, AWS, etc. The Pulsar daemon
remains the under-the-hood backend; the surface presented to users and
operators is neutral about it.

Scope:

* DB tables: pulsar_byoc_resource → compute_resource, pulsar_byoc_bootstrap_token
  → compute_resource_registration. Migration filename + content updated;
  feature has not shipped, so the original migration is just renamed in place.
* ORM models: PulsarByocResource → ComputeResource, PulsarByocBootstrapToken
  → ComputeResourceRegistration.
* Manager module + class: lib/galaxy/managers/pulsar_byoc.py →
  lib/galaxy/managers/compute_resources.py;
  PulsarByocManager → ComputeResourceManager. Exposed as
  app.compute_resource_manager (was app.byoc_manager) and declared on
  MinimalManagerApp. Exceptions renamed: BootstrapTokenInvalid →
  RegistrationTokenInvalid, BootstrapTokenExpired →
  RegistrationTokenExpired, PulsarByocError → ComputeResourceError.
* Service module + class: pulsar_byoc.py → compute_resources.py;
  PulsarByocService → ComputeResourceService.
* API: lib/galaxy/webapps/galaxy/api/pulsar_byoc.py → compute_resources.py;
  router tags=["compute_resources"]; routes:
    POST   /api/compute_resources/registrations
    POST   /api/compute_resources/registrations/complete
    GET    /api/compute_resources
    GET    /api/compute_resources/{id}
    DELETE /api/compute_resources/{id}
    POST   /api/compute_resources/{id}/purge
* Schema module: pulsar_byoc.py → compute_resources.py;
  PulsarByocResourceSummary → ComputeResourceSummary;
  BootstrapPayload → RegistrationCompletionPayload.
* Config options: enable_pulsar_byoc → enable_compute_resources;
  pulsar_byoc_relay_url → compute_resource_relay_url.
* TPV destination param: pulsar_byoc_resource_id → compute_resource_id.
  Sample tpv file renamed: tpv/byoc.yml.sample → tpv/compute_resources.yml.sample.
* Job runner: PulsarMQBYOCJobRunner class name preserved (it IS a
  Pulsar MQ runner using the BYOC pattern); registered runner id is
  now `compute_resource`. BYOCClientManagerRegistry →
  ComputeResourceClientManagerRegistry.
* Vault path: pulsar_byoc/{id}/relay_refresh_token →
  compute_resource/{id}/relay_refresh_token.
* Integration test dir + filenames: test/integration/pulsar_byoc/ →
  test/integration/compute_resources/; test_byoc_* → test_compute_resource_*;
  template files dropped the byoc_ prefix.
* Client TS schema regenerated via openapi-typescript.

Verified: tox -e lint, -e format, -e mypy all green; 90 BYOC-related
unit tests pass.
pulsar-relay 0.2.0 transitively required starlette<1.0.0 via
prometheus-fastapi-instrumentator (every available version of that
package pins starlette<1.0.0). Galaxy pins starlette==1.0.0 in
pinned-requirements.txt, so uv could not resolve the test dep group
on any CI shard — every Python test job died at install time with
``× No solution found when resolving dependencies``.

pulsar-relay 0.2.1 makes prometheus-fastapi-instrumentator an
optional dependency under a new ``[metrics]`` extra and guards the
``Instrumentator().instrument().expose()`` call in main.py with a
try/except ImportError. Galaxy doesn't need the auto-exposed
``/metrics`` route — the relay's prometheus_client-based counters
(used by the API code paths) still record without it — so installing
plain ``pulsar-relay>=0.2.1`` is enough and lets the resolver
succeed against starlette==1.0.0.
…runner

Extended metadata has pulsar write the post-job model store on the
remote host, which Galaxy then collects from the destination's staging
directory. The compute-resource (multi-tenant pulsar) runner has no
shared filesystem between Galaxy and the user's pulsar — staged outputs
can't be read back via path-based access regardless of operator config —
and the remote pulsar typically doesn't ship Galaxy's metadata writer
either. The job would die opaquely on the remote.

Refuse at submit time inside ``get_client_from_wrapper`` so the
operator gets a clear, actionable error before any job-prep work
happens. ``recover`` / ``stop`` aren't gated — those operate on
already-submitted jobs that necessarily passed the check.

Three new unit tests in test_compute_resource_runner.py cover the
positive case (extended → raise) and the negative parametric case
(directory / legacy / None all pass through).
…ient device flow

Replaces two test-internal shortcuts with the production code paths they
were standing in for:

* The heavy e2e test was manually ``model.ComputeResource(...)`` +
  ``UserVaultWrapper.write_secret(...)`` to wire up the resource. The
  Galaxy-side ``ComputeResourceManager.complete_registration``
  (token-exchange → sub-claim validation → topic pinning → DB insert →
  vault write) was therefore entirely uncovered by integration tests.
  Drop the shortcut and instead drive Galaxy's real bootstrap endpoints
  (``POST /api/compute_resources/registrations`` and
  ``/registrations/complete``) from ``setUp``. That's the same path
  ``pulsar-config register-with-galaxy`` exercises on the host side.

* The RFC 8628 device-flow polling loop was reimplemented in
  ``_device_flow.py`` even though ``pulsar_relay_client`` ships
  ``RelayDeviceFlowAuthenticator`` for exactly this purpose, with an
  ``on_user_code`` hook designed for "tests / alternative UIs". Use the
  library helper directly; pass the Keycloak operator-login as the
  ``on_user_code`` callback. Removes ~130 lines of duplicated
  RFC-8628-spec implementation.

``_pre_create_topics`` stays in ``_prepare_galaxy`` to keep Pulsar's
long-poll subscriptions from blocking before Galaxy boots — but
``complete_registration``'s own ``create_or_verify_topic`` loop is
still exercised in ``setUp`` (the duplicate calls are idempotent under
same-owner topic ownership).
The harness was rolling its own ports, docker-availability check, tempdir,
compose orchestration, and lifecycle hooks. Each of those has a shared
counterpart used elsewhere in the suite:

* ``_free_port`` → ``galaxy.util.sockets.unused_port``
* ``_docker_running`` + ``_compose_cmd`` → ``@integration_util.skip_unless_docker()``
  class decorator (same gating ``test_auth_oidc.py`` uses)
* ``COMPUTE_RESOURCE_E2E_TMP`` env-var override + ad-hoc ``tempfile.mkdtemp``
  → ``cls._test_driver.galaxy_test_tmp_dir`` (per-class, auto-cleaned by
  ``IntegrationTestCase.tearDownClass`` via ``cleanup_directory``)
* Pointless empty ``_configure_app`` override → drop
* Docker-compose-based Keycloak bring-up → ``docker run`` mirroring
  ``test/integration/oidc/test_auth_oidc.py:start_keycloak_docker``. The
  YAML+compose-shim machinery (``_compose_cmd`` variant detection,
  ``teardown_compose``, ``docker-compose.yml``, the ``compose_env`` field
  on ``KeycloakHandle``) is gone.

Net -73 lines across the three touched files.
OIDC and compute_resources both stood up Keycloak via ``docker run`` with
near-identical boilerplate but had drifted on image (``keycloak/keycloak:26.2``
vs ``quay.io/keycloak/keycloak:26.0``), env-var style (``KC_BOOTSTRAP_ADMIN_*``
vs deprecated ``KEYCLOAK_ADMIN*``), and ready-probe (``/.well-known/...`` vs
``/realms/master``).

Lift the shared bits into a new module with two start flavours that share
image + ready-probe + teardown:

* ``start_keycloak_https_with_realm`` — production mode, TLS, realm
  imported from a mounted directory. Used by ``test_auth_oidc.py``.
* ``start_keycloak_http_dev`` — dev mode, HTTP, no realm import; the
  consuming suite provisions the realm dynamically via the admin API.
  Used by the compute_resources harness.

Both call ``wait_till_keycloak_ready`` (``/realms/master``, optional cert
verification). Teardown is one ``stop_keycloak_docker`` for both. A bump
of ``KEYCLOAK_IMAGE`` now touches both suites in one place; both move
onto ``quay.io/keycloak/keycloak:26.2`` and the modern bootstrap env vars.
Lets CI exercise the pulsar-side compute_resources URL rename (PR galaxyproject#459)
against this branch without cutting a pulsar release. Two pieces:

* requirements.txt (pinned-requirements.txt): pin pulsar-galaxy-lib at the
  PR branch via a git URL instead of ==0.15.14.
* common_startup.sh: export PULSAR_GALAXY_LIB=1 before the dependency
  install. pulsar's setup.py names the dist "pulsar-app" by default and
  only "pulsar-galaxy-lib" when this env var is set; without it the git
  source build produces a dist whose name doesn't match the requirement
  and the install fails ("Package metadata name `pulsar-app` does not
  match given name `pulsar-galaxy-lib`"). Harmless for the normal
  released-wheel path — the env var only affects building from source.

REVERT before merge (restore the ==<version> pin once galaxyproject#459 is released).
@mvdbeek mvdbeek force-pushed the pulsar-byoc-capabilities branch from 2c5533d to 297437a Compare May 28, 2026 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants