-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: add emr-serverless step for SageMaker Pipelines #5325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
wutimot
wants to merge
159
commits into
aws:master-v2
Choose a base branch
from
wutimot:feature/emr-serverless-step
base: master-v2
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+390,684
−384,875
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* fix: skip TF tests for unsupported versions * flake8
* feat: add pytorch-tgi-inference 2.4.0 * add tgi 3.0.1 image * skip faulty test * formatting * formatting * add hf pytorch training 4.46 * update version alias * add py311 to training version * update tests with pyversion 311 * formatting --------- Co-authored-by: Erick Benitez-Ramos <[email protected]>
…mage (aws#4992) Co-authored-by: Erick Benitez-Ramos <[email protected]>
* Fix ssh host policy * Filter policy by algo- * Add docstring * Fix pylint * Fix docstyle summary * Unit test * Fix unit test * Change to unit test * Fix unit tests * Test comment out flaky tests * Readd the flaky tests * Remove flaky asserts * Remove flaky asserts --------- Co-authored-by: Erick Benitez-Ramos <[email protected]>
* change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions * change: Allow telemetry only in supported regions --------- Co-authored-by: Roja Reddy Sareddy <[email protected]>
* implemented multi-node distribution with @Remote function * completed unit tests * added distributed training with CPU and torchrun * backwards compatibility nproc_per_node * fixing code: permissions for non-root users, integration tests * fixed docstyle * refactor nproc_per_node for backwards compatibility * refactor nproc_per_node for backwards compatibility * pylint fix, newlines * added unit tests for bootstrap_environment remote * added mpirun protocol for distributed training with @Remote decorator * aligned mpi_utils_remote.py to mpi_utils.py for estimator * updated docstring for sagemaker sdk doc --------- Co-authored-by: Erick Benitez-Ramos <[email protected]>
* feat: Add support for deeepseek recipes * pylint * add unit test
…ws#5002) * fix: fix ValueError when updating a data quality monitoring schedule * Add unit test * black formatting --------- Co-authored-by: Erick Benitez-Ramos <[email protected]> Co-authored-by: parknate@ <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
* Add cleanup logic to model builder integ tests for endpoints * Fix endpoint api call
…lly (aws#5014) * fix: bug in get latest version was getting the max sorted alphabetically instead of sem-ver * handle invalid sev ver and incompatible sagemaker versions --------- Co-authored-by: Eli Davidson <[email protected]> Co-authored-by: parknate@ <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* Fix sourcedir.tar.gz filenames in docstrings * Fix pylint --------- Co-authored-by: pintaoz <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* Fix all type hint and docstrings for callable * Fix codestyle --------- Co-authored-by: pintaoz <[email protected]>
* fix: keep sagemaker_session from being overridden to None, add unit/integ tests * remove commented code * fix styling issues --------- Co-authored-by: Zhaoqi <[email protected]>
* fix: Fix the recipe selection for multiple recipe scenario * fix: Fix the recipe selection for multiple recipe scenario * fix: Hyperparameter issue fixes, validate s3 output path,additional unit tests * Fix: Add validation to bedrock reward models * Fix: Add validation to bedrock reward models * Fix: Add allow list for bedrock eval models * Fix: Add allow list for bedrock eval models * Fix: Bug fixes for s3 path validation, mlflow app creation * Fix: Update Legal verbiage, and allowed reward model ids based on region * Fix: Update model_package_group_name to model_package_group in all trianers to maintain consistency * Fix: fix sagemaker-serve tests --------- Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: Yadan Wei <[email protected]>
* bug fix for hmac key and remove remote function from train * Remove remaining REMOTE_FUNCTION_SECRET_KEY references from tests * Add back remote function folder --------- Co-authored-by: Zhaoqi <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* feat: Add support to trainer object for model parameter in Evaluator * feat: Evaluator handshake with trainer --------- Co-authored-by: Roja Reddy Sareddy <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* add evaluator tagging for jumpstart models * fix bug for extending tags * bug fix for js tags * add unit test for js evaluator tagging --------- Co-authored-by: aviruthen <[email protected]>
…#5425) * feat: Add support to trainer object for model parameter in Evaluator * feat: Evaluator handshake with trainer * fix: update evaluate_base_model as False, minor change to README --------- Co-authored-by: Roja Reddy Sareddy <[email protected]>
* Update image_uri_config, fw_utils and image_uris.py in sagemaker-core * Add ModelTrainer updates - Used latest code in commit: aws@9f70fb2#diff-6643c001ac6e4e110393f1a33700adf2054cc594e5ff1e3e2630131d2c6c0551 * Update s3 bucket check in session_helper.py Code change is based on commit: aws@903cb8a * fix: Map llama models to correct script Based on commit: aws/sagemaker-python-sdk-staging@67a3e5a * fix: honor json serialization of HPs aws/sagemaker-python-sdk-staging@246d560 * fix: clarify model monitor one time schedule bug From commit: aws@ddc54d2 * fix: Allow import failure for internal _hashlib module From commit: aws/sagemaker-python-sdk-staging@5198f28 * Remove duplicate model_trainer.py * Add ignore_patterns in ModelTrainer to ignore specific files/folders For commit: aws@829030a * Update instance type regex to also include hyphens For commit: aws/sagemaker-python-sdk-staging@824675b * chore: domain support for eu-isoe-west-1 For commit: aws@d0bd4f7 * Fix: Object of type ModelLifeCycle is not JSON serializable For commit: aws@844b558 * fix: sanitize git clone repo input url For commit: aws/sagemaker-python-sdk-staging@ed143b7 * Add support for MetricDefinitions in ModelTrainer For commit: aws@0215512 * feat: support pipeline versioning For commit: aws/sagemaker-python-sdk-staging@9bfe85a * add eval custom lambda arn to hyperparameter For commit: aws/sagemaker-python-sdk-staging@bcd5348 * Add Numpy 2.0 support For commit: aws/sagemaker-python-sdk-staging@99210b2 Tested by running sagemaker-serve unit tests * fix: update get_execution_role to directly return the ExecutionRoleArn if it presents in the resource metadata file For commit: aws/sagemaker-python-sdk-staging@b9df334 * HF Optimum Neuron 0.4.1 DLCs For commit: aws@5d3f175 * Fix import error * Fix llama_v3 in sm_recipes * Remove duplicate json in image_retriever * Add todo notes in pipeline class * Add V2 image_config_url unit tests --------- Co-authored-by: aviruthen <[email protected]>
* Add input validation and resource management improvements V3 * Allowing for sym-links, better refactoring * Removing home path and adding additional validaiton * Including check for root directory * Adding root directory validation to other helpers
* Update CHANGELOG.md sagemaker meta * Update CHANGELOG.md sagemaker-core * Update CHANGELOG.md sagemaker-train * Update CHANGELOG.md sagemaker-serve * Update CHANGELOG.md sagemaker-mlops * Update VERSION sagemaker-core * Update VERSION sagemaker-train * Update VERSION sagemaker-serve * Update pyproject.toml sagemaker-train * Update pyproject.toml sagemaker-train * Update VERSION sagemaker-mlops * Update pyproject.toml sagemaker-mlops * Update VERSION meta * Update pyproject.toml meta
* Add aws batch implementation (works with example notebook) * fixing unit tests and adding integration test * add example notebook * Adding missing dependencies for aws_batch * Fixing indentation bug in source code * comment out delete resources in example notebook * Add notebook png and remove extraneous comments * Add in png correctly * Removing logs_from_job from session_helper * Adding helpers for logging * Make helper methods internal * Adding back nest asyncio dependency * Updating unit tests for internal-external method changes
* Initialize framework and version in post_init * Fix model registry notebook * removing redundant condition check
Co-authored-by: pintaoz <[email protected]>
Co-authored-by: pintaoz <[email protected]>
* fix: Update telmetry constants to include MLOPS as well * fix: Update telmetry constants to include MLOPS as well * fix: Update telmetry constants to include MLOPS as well --------- Co-authored-by: Roja Reddy Sareddy <[email protected]>
* fix: Remove tags from ProcessingJob creation in Processor The ProcessingJob resource doesn't accept tags parameter during initialization. Remove tags from the transformed dict before passing to ProcessingJob constructor to prevent errors. * Unit tests
ce56c53 to
b0dcfe9
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available:
Description of changes: Adding EMR Serverless Step to SageMaker Pipelines
Testing done: Unit test and integration test both pass locally
Merge Checklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_baseto create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.