Skip to content

Add -DRIVERHOST and -SUBHOST support for distributed Schrödinger jobs#14

Open
chrisdag wants to merge 1 commit into
MolecularAI:publicfrom
chrisdag:feature/driverhost-subhost
Open

Add -DRIVERHOST and -SUBHOST support for distributed Schrödinger jobs#14
chrisdag wants to merge 1 commit into
MolecularAI:publicfrom
chrisdag:feature/driverhost-subhost

Conversation

@chrisdag

Copy link
Copy Markdown

BODY:

Summary

Adds optional driverhost and subhost parameters to the Schrodinger base class in maize/steps/mai/common/schrodinger.py, enabling separate host routing for the driver/coordinator process and compute-intensive subjobs in distributed Schrödinger jobs.

Resolves #13.

Motivation

Schrödinger's Job Control supports three host-routing flags for distributed jobs (Glide with -NJOBS > 1, FEP+, IFD, etc.):

  • -HOST — sets both driver and subjob hosts (current maize behavior)
  • -DRIVERHOST — overrides for the lightweight coordinator/driver process
  • -SUBHOST — overrides for compute-intensive subjobs

On HPC clusters with dedicated driver partitions (a standard Schrödinger deployment pattern — their sample hosts.yml ships a driver entry), the driver process wastes a full compute slot when only -HOST is available. This matters on small partitions where every slot counts.

Changes

  • 2 new optional parameters on Schrodinger class: driverhost and subhost (with docstrings referencing the Schrödinger docs)
  • Token cleanup loop: extended to strip -DRIVERHOST and -SUBHOST alongside -JOBNAME, -HOST, -NJOBS
  • Command construction: after -NJOBS injection, conditionally emits -DRIVERHOST and/or -SUBHOST when the parameters are set

Backward Compatibility

When driverhost and subhost are not set (the default), the command construction is identical to the current code. No existing workflows are affected.

Testing

Validated on an AWS ParallelCluster (pcluster 3.11.1, Ubuntu 22.04) running Schrödinger Suite 2025-4 with REINVENT4 and Maize 0.9.4:

  • End-to-end Maize → LigPrep → Glide (HTVS, n_jobs=4) pipeline confirmed working with -HOST routing to a remote Job Server
  • Distributed Glide correctly spawns driver + 4 worker subjobs via the Job Server's Slurm queue integration
  • Patch applied to production environment alongside existing maize patches (remote jobserver, retry logic, timeout adjustments)

Schrödinger Documentation Reference

  • "The HOST, DRIVERHOST, and SUBHOST Options" (jobs/running_jobs_command_line_host_options.htm, 2025-4)
  • "Running Distributed Schrödinger Jobs" (jobs/distributed_jobs.htm, 2025-4) — Table 1 confirms Glide and LigPrep use "Standard" driver location

Add optional driverhost and subhost parameters to the Schrodinger base
class, enabling separate host routing for driver/coordinator processes
and compute-intensive subjobs. This follows Schrodinger Job Control
conventions where -DRIVERHOST overrides -HOST for the driver and
-SUBHOST overrides -HOST for subjobs.

When neither parameter is set, behavior is identical to the current
code (fully backward compatible).

Resolves MolecularAI#13
@chrisdag

Copy link
Copy Markdown
Author

Note on test coverage: The Schrodinger base class does not currently have a TestSuiteSchrodinger test class (unlike e.g. TestSuiteGlide in glide.py). Testing the base class requires a live Schrödinger Job Server installation, which limits what can be covered in unit tests.

The new parameters follow the exact same pattern as the existing host and n_jobs parameters (same type, same injection point in _run_schrodinger_job()), and have been validated end-to-end on a production HPC cluster with Schrödinger 2025-4 and Maize 0.9.4.

Happy to add a mock-based unit test for the command construction logic if that would be preferred.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for -DRIVERHOST and -SUBHOST in Schrödinger job submission

1 participant