Skip to content

Conversation

@wutimot
Copy link
Collaborator

@wutimot wutimot commented Nov 19, 2025

Issue #, if available:

Description of changes: Adding EMR Serverless Step to SageMaker Pipelines

Testing done: Unit test and integration test both pass locally

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • [x ] I have read the CONTRIBUTING doc
  • [x ] I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • [x ] I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • [x ] I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • [x ] I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • [x ] I have checked that my tests are not configured for a specific region or account (if appropriate)
  • [x ] I have used unique_name_from_base to create resource names in integ tests (if appropriate)
  • If adding any dependency in requirements.txt files, I have spell checked and ensured they exist in PyPi

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot and others added 30 commits January 29, 2025 19:27
* fix: skip TF tests for unsupported versions

* flake8
* feat: add pytorch-tgi-inference 2.4.0

* add tgi 3.0.1 image

* skip faulty test

* formatting

* formatting

* add hf pytorch training 4.46

* update version alias

* add py311 to training version

* update tests with pyversion 311

* formatting

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
* Fix ssh host policy

* Filter policy by algo-

* Add docstring

* Fix pylint

* Fix docstyle summary

* Unit test

* Fix unit test

* Change to unit test

* Fix unit tests

* Test comment out flaky tests

* Readd the flaky tests

* Remove flaky asserts

* Remove flaky asserts

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

* change: Allow telemetry only in supported regions

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* implemented multi-node distribution with @Remote function

* completed unit tests

* added distributed training with CPU and torchrun

* backwards compatibility nproc_per_node

* fixing code: permissions for non-root users, integration tests

* fixed docstyle

* refactor nproc_per_node for backwards compatibility

* refactor nproc_per_node for backwards compatibility

* pylint fix, newlines

* added unit tests for bootstrap_environment remote

* added  mpirun protocol for distributed training with @Remote decorator

* aligned mpi_utils_remote.py to mpi_utils.py for estimator

* updated docstring for sagemaker sdk doc

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
* feat: Add support for deeepseek recipes

* pylint

* add unit test
…ws#5002)

* fix: fix ValueError when updating a data quality monitoring schedule

* Add unit test

* black formatting

---------

Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: parknate@ <[email protected]>
* Add cleanup logic to model builder integ tests for endpoints

* Fix endpoint api call
…lly (aws#5014)

* fix: bug in get latest version was getting the max sorted alphabetically
instead of sem-ver

* handle invalid sev ver and incompatible sagemaker versions

---------

Co-authored-by: Eli Davidson <[email protected]>
Co-authored-by: parknate@ <[email protected]>
* Fix sourcedir.tar.gz filenames in docstrings

* Fix pylint

---------

Co-authored-by: pintaoz <[email protected]>
* Fix all type hint and docstrings for callable

* Fix codestyle

---------

Co-authored-by: pintaoz <[email protected]>
* fix: keep sagemaker_session from being overridden to None, add unit/integ tests

* remove commented code

* fix styling issues

---------

Co-authored-by: Zhaoqi <[email protected]>
rsareddy0329 and others added 25 commits December 10, 2025 18:12
* fix: Fix the recipe selection for multiple recipe scenario

* fix: Fix the recipe selection for multiple recipe scenario

* fix: Hyperparameter issue fixes, validate s3 output path,additional unit tests

* Fix: Add validation to bedrock reward models

* Fix: Add validation to bedrock reward models

* Fix: Add allow list for bedrock eval models

* Fix: Add allow list for bedrock eval models

* Fix: Bug fixes for s3 path validation, mlflow app creation

* Fix: Update Legal verbiage, and allowed reward model ids based on region

* Fix: Update model_package_group_name to model_package_group in all trianers to maintain consistency

* Fix: fix sagemaker-serve tests

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* bug fix for hmac key and remove remote function from train

* Remove remaining REMOTE_FUNCTION_SECRET_KEY references from tests

* Add back remote function folder

---------

Co-authored-by: Zhaoqi <[email protected]>
* feat: Add support to trainer object for model parameter in Evaluator

* feat: Evaluator handshake with trainer

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* add evaluator tagging for jumpstart models

* fix bug for extending tags

* bug fix for js tags

* add unit test for js evaluator tagging

---------

Co-authored-by: aviruthen <[email protected]>
…#5425)

* feat: Add support to trainer object for model parameter in Evaluator

* feat: Evaluator handshake with trainer

* fix: update evaluate_base_model as False, minor change to README

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* Update image_uri_config, fw_utils and image_uris.py in sagemaker-core

* Add ModelTrainer updates

- Used latest code in commit: aws@9f70fb2#diff-6643c001ac6e4e110393f1a33700adf2054cc594e5ff1e3e2630131d2c6c0551

* Update s3 bucket check in session_helper.py

Code change is based on commit: aws@903cb8a

* fix: Map llama models to correct script

Based on commit: aws/sagemaker-python-sdk-staging@67a3e5a

* fix: honor json serialization of HPs

aws/sagemaker-python-sdk-staging@246d560

* fix: clarify model monitor one time schedule bug

From commit: aws@ddc54d2

* fix: Allow import failure for internal _hashlib module

From commit: aws/sagemaker-python-sdk-staging@5198f28

* Remove duplicate model_trainer.py

* Add ignore_patterns in ModelTrainer to ignore specific files/folders

For commit: aws@829030a

* Update instance type regex to also include hyphens

For commit: aws/sagemaker-python-sdk-staging@824675b

* chore: domain support for eu-isoe-west-1

For commit: aws@d0bd4f7

* Fix: Object of type ModelLifeCycle is not JSON serializable

For commit: aws@844b558

* fix: sanitize git clone repo input url

For commit: aws/sagemaker-python-sdk-staging@ed143b7

* Add support for MetricDefinitions in ModelTrainer

For commit: aws@0215512

* feat: support pipeline versioning

For commit: aws/sagemaker-python-sdk-staging@9bfe85a

* add eval custom lambda arn to hyperparameter

For commit: aws/sagemaker-python-sdk-staging@bcd5348

* Add Numpy 2.0 support

For commit: aws/sagemaker-python-sdk-staging@99210b2

Tested by running sagemaker-serve unit tests

* fix: update get_execution_role to directly return the ExecutionRoleArn if it presents in the resource metadata file

For commit: aws/sagemaker-python-sdk-staging@b9df334

* HF Optimum Neuron 0.4.1 DLCs

For commit: aws@5d3f175

* Fix import error

* Fix llama_v3 in sm_recipes

* Remove duplicate json in image_retriever

* Add todo notes in pipeline class

* Add V2 image_config_url unit tests

---------

Co-authored-by: aviruthen <[email protected]>
* Add input validation and resource management improvements V3

* Allowing for sym-links, better refactoring

* Removing home path and adding additional validaiton

* Including check for root directory

* Adding root directory validation to other helpers
* Update CHANGELOG.md sagemaker meta

* Update CHANGELOG.md sagemaker-core

* Update CHANGELOG.md sagemaker-train

* Update CHANGELOG.md sagemaker-serve

* Update CHANGELOG.md sagemaker-mlops

* Update VERSION sagemaker-core

* Update VERSION sagemaker-train

* Update VERSION sagemaker-serve

* Update pyproject.toml sagemaker-train

* Update pyproject.toml sagemaker-train

* Update VERSION sagemaker-mlops

* Update pyproject.toml sagemaker-mlops

* Update VERSION meta

* Update pyproject.toml meta
* Add aws batch implementation (works with example notebook)

* fixing unit tests and adding integration test

* add example notebook

* Adding missing dependencies for aws_batch

* Fixing indentation bug in source code

* comment out delete resources in example notebook

* Add notebook png and remove extraneous comments

* Add in png correctly

* Removing logs_from_job from session_helper

* Adding helpers for logging

* Make helper methods internal

* Adding back nest asyncio dependency

* Updating unit tests for internal-external method changes
* Initialize framework and version in post_init

* Fix model registry notebook

* removing redundant condition check
* fix: Update telmetry constants to include MLOPS as well

* fix: Update telmetry constants to include MLOPS as well

* fix: Update telmetry constants to include MLOPS as well

---------

Co-authored-by: Roja Reddy Sareddy <[email protected]>
* fix: Remove tags from ProcessingJob creation in Processor

The ProcessingJob resource doesn't accept tags parameter during
initialization. Remove tags from the transformed dict before
passing to ProcessingJob constructor to prevent errors.

* Unit tests
@wutimot wutimot force-pushed the feature/emr-serverless-step branch from ce56c53 to b0dcfe9 Compare January 6, 2026 23:50
@wutimot wutimot deployed to auto-approve January 7, 2026 00:00 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.