cptools2 — Python package

A lightweight command-line package to generate and manage CellProfiler analysis jobs for HPC clusters. cptools2 builds image lists, splits work into jobs, creates space-optimized plate batches, generates submission scripts, and can join and optionally transfer result CSVs after analysis.

Release highlights (v1.0.0)

Packaging consolidated under pyproject.toml with modern metadata and direct references to supporting parser utilities.
Scratch-space batching defaults increased to 75% utilisation with a 30% per-plate overhead buffer.
Installation instructions updated for both pip and uv workflows.
NEW: Automatic LoadData metadata enrichment integrated into the join workflow—output CSVs now include complete well, site, and plate metadata.

Installation

pip / virtualenv

python -m venv .venv
source .venv/bin/activate    # On Windows: .venv\Scripts\activate
pip install --upgrade pip
pip install -e .[dev]

Notes:

Runtime dependencies (pandas, pyyaml, parserix, scissorhands) are resolved automatically via pyproject.toml.
Use pip install . for a pure runtime install without developer extras.

uv workflow

uv venv
uv pip install .[dev]  # or: uv pip install .

To run commands without activating the environment explicitly, prefix with uv run, e.g. uv run pytest.

Quick usage

Generate a full workflow (creates loaddata, per-plate command files and a master submit script):

cptools2 generate config.yml

Join chunked CSV outputs after analysis (one or more patterns):

cptools2 join --location /path/to/location --patterns Image.csv Cells.csv

The join command automatically enriches output files with LoadData metadata before concatenation, ensuring complete metadata columns in the final output.

Metadata enrichment

When you run cptools2 join, the tool automatically enriches all output CSV files with metadata from the corresponding LoadData files. This happens before concatenation to preserve metadata integrity.

Why this matters:

Each CellProfiler chunk has sequential ImageNumber values (1, 2, 3, ...). If chunks are concatenated first, these ImageNumbers become duplicated and lose meaning. By enriching each chunk individually before concatenation, every row retains its correct metadata (well, site, plate, etc.).

Example workflow:

Input structure:
  loaddata/plate_0.csv          <- Contains Metadata_well, Metadata_site, etc.
  loaddata/plate_1.csv
  raw_data/plate_0/Image.csv    <- No metadata, just measurements
  raw_data/plate_1/Image.csv

Processing:
  1. Enrich chunk 0: Join Image.csv with loaddata/plate_0.csv on ImageNumber
  2. Enrich chunk 1: Join Image.csv with loaddata/plate_1.csv on ImageNumber
  3. Concatenate enriched chunks into joined_files/plate_Image.csv

Output:
  joined_files/plate_Image.csv  <- Complete metadata preserved!

To disable metadata enrichment (for debugging), use --no-enrich-metadata:

cptools2 join --location /path/to/location --patterns Image.csv --no-enrich-metadata

Troubleshooting:

If metadata enrichment fails, check that LoadData CSV files exist in location/loaddata/ with filenames matching the chunk pattern (e.g., plate_0.csv).
Verify that both the LoadData and output CSVs have an ImageNumber column.
Check for ImageNumber mismatches—output CSVs with more rows than their corresponding LoadData CSV will have null metadata for unmatched rows.

YAML configuration

cptools2 accepts a YAML configuration file describing the experiment, pipeline, and optional features such as batching and transfer. A canonical example is included at tests/new_config.yaml.

Sanitized example (matches tests/new_config.yaml):

chunk: 96
join_files:
  - Image.csv
location: /path/to/scratch/$USER/project/outputs
commands location: /path/to/scratch/$USER/project/commands
pipeline: /path/to/pipeline.cppipe

add plate:
  - experiment: /path/to/imagexpress/experiment
    plates:
      - plate_1
      - plate_2

data_destination: /path/to/datastore/project/data

Common fields:

experiment / add plate: where to find image data and which plates to include
chunk: desired images-per-job (integer)
pipeline: path to the .cppipe CellProfiler pipeline
location: base location for image outputs
commands location: directory to write command files
join_files: list of CSV filenames to join after analysis
data_destination: optional path for post-join transfer

Advanced sections:

batching — overrides for automatic batching (if supported)
transfer — transfer provider configuration (S3 or other); cptools2 will write transfer metadata/commands but actual transfer depends on runner hooks

Behavior notes

generate will discover plates under the experiment, create image lists, split jobs according to chunk, apply batching overrides (if present), and write command files into commands location.
join concatenates/join CSV outputs after the analysis; provide filename patterns to target.
Transfer entries in the config are optional. generate records transfer commands/metadata; running transfers typically requires cluster-side hooks or CI steps that read the produced metadata.

Developer quickstart & testing

Run tests:

uv run pytest  # or: pytest

Run linters/formatters (configured via pre-commit):

pre-commit run --all-files

HPC validation checklist

These steps mirror the checks typically performed on the Eddie HPC cluster:

Scratch quota assessment – run cptools2 generate with real experiment configs to confirm 75% utilisation and overhead handling fits within assigned scratch space.
Batch command generation – verify per-batch command files (staging_batch_*.txt, cp_commands_batch_*.txt) are produced and reference the expected plates.
CellProfiler dry run – execute one batch via the HPC queue to confirm staging, CellProfiler invocation, and cleanup complete without exceeding scratch limits.
Packaging install test – from a clean node, run pip install git+https://github.com/CarragherLab/cptools2@v1.0.0 (or sync via uv) to ensure dependencies resolve correctly.
Post-analysis join – validate cptools2 join against batch outputs for consistency with historical runs.

Document outcomes for each release to maintain an audit trail.

Contributing

Open issues or PRs describing bugs or enhancements.
Keep changes small and focused; tests should accompany functional changes.

License

This project is distributed under the MIT License. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
cptools2		cptools2
data		data
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
DEVELOPMENT.md		DEVELOPMENT.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
temp_master_job.py		temp_master_job.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

cptools2 — Python package

Release highlights (v1.0.0)

Table of contents

Installation

pip / virtualenv

uv workflow

Quick usage

Metadata enrichment

YAML configuration

Behavior notes

Developer quickstart & testing

HPC validation checklist

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

cptools2 — Python package

Release highlights (v1.0.0)

Table of contents

Installation

pip / virtualenv

uv workflow

Quick usage

Metadata enrichment

YAML configuration

Behavior notes

Developer quickstart & testing

HPC validation checklist

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages