Sagemaker Inference Module

This module builds and configures SageMaker inference models and endpoints

This repository is a READ-ONLY sub-tree split. See https://github.com/FriendsOfTerraform/modules to create issues or submit pull requests.

Example Usage

Basic Usage

module "basic_usage" {
  source = "github.com/FriendsOfTerraform/aws-sagemaker-inference.git?ref=v1.0.0"

  # manages multiple models
  models = {
    # The keys of the map are model names
    demo-model = {
      iam_role_arn = "arn:aws:iam::111122223333:role/service-role/AmazonSageMakerServiceCatalogProductsExecutionRole"

      # manages multiple container definitions
      container_definitions = {
        # the keys of the map are DNS name for the containers
        container1 = {
          image               = "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.19.0"
          model_data_location = "s3://demo-bucket/demo-model.tar.gz"
        }
      }
    }
  }

  # manages multiple endpoints
  endpoints = {
    # the keys of the map are endpoint names
    realtime-endpoint = {
      provisioned = {
        production_variants = {
          # must refer to models created by this module
          demo-model = {
            instance_type  = "ml.m5.large"

            auto_scaling = {
              policies = {
                # the keys of the map are policy names
                builtin-policy            = { expression = "SageMakerVariantInvocationsPerInstance = 1000" }
                keep-invocations-near-100 = { expression = "Invocations average = 100" }
              }
            }

            cloudwatch_alarms = {
              # the keys of the map are alarm names
              invocations-greater-than-1000         = { expression = "Invocations average > 1000" }
              invocation-5xx-errors-greater-than-10 = { expression = "Invocation5XXErrors average >= 10" }
            }
          }
        }
      }
    }
  }
}

Inputs

Required

Type Name Default Value

map(object(models)) models

Deploy multiple models.

Examples:

Basic Usage

Since: 1.0.0

Optional

Type	Name	Default Value
`map(string)`	additional_tags_all	`{}`
Additional tags for all resources deployed with this module Since: 1.0.0
`map(object(endpoints))`	endpoints	`{}`
Configures multiple endpoints Since: 1.0.0

Objects

async_invocation_config

Specifies configuration for how an endpoint performs asynchronous inference

Since: 1.0.0

Type	Name	Default Value
`string`	s3_output_path
Location to upload response output on success. Must be an S3 url(s3 path) Since: 1.0.0
`string`	encryption_key	`null`
Specify an existing KMS key's ARN to encrypt your response output in S3. Since: 1.0.0
`string`	error_notification_location	`null`
SNS topic to post a notification when inference fails. If no topic is provided, no notification is sent Since: 1.0.0
`number`	max_concurrent_invocations_per_instance	`null`
The maximum number concurrent requests sent to model container. If no value is provided, SageMaker chooses an optimal value. Since: 1.0.0
`string`	s3_failure_path	`null`
Location to upload response output on failure. Must be an S3 url (s3 path). Since: 1.0.0
`string`	success_notification_location	`null`
SNS topic to post a notification when inference completes successfully. If no topic is provided, no notification is sent Since: 1.0.0

auto_scaling

Enables auto scaling

Since: 1.0.0

Type	Name	Default Value
`map(object(policies))`	policies
Manages multiple auto scaling policies Since: 1.0.0
`number`	maximum_capacity	`1`
Specify the maximum number of EC2 instances to maintain. Since: 1.0.0
`number`	minimum_capacity	`1`
Specify the minimum number of EC2 instances to maintain. Since: 1.0.0

capture_content_type

The content type headers to capture. Must specify one of csv_text or json

Since: 1.0.0

Type	Name	Default Value
`list(string)`	csv_text	`null`
The CSV content type headers to capture. Since: 1.0.0
`list(string)`	json	`null`
The JSON content type headers to capture. Since: 1.0.0

cloudwatch_alarms

Configures multiple Cloudwatch alarms.

Examples:

Basic Usage

Since: 1.0.0

Type	Name	Default Value
`string`	expression
The expression in `<metric_name> <statistic> <comparison_operator> <threshold>` format. For example: `"Invocations average >= 100"` Since: 1.0.0
`string`	description	`null`
The description of the alarm Since: 1.0.0
`number`	evaluation_periods	`1`
The number of periods over which data is compared to the specified threshold. Since: 1.0.0
`string`	notification_sns_topic	`null`
The SNS topic where notification will be sent Since: 1.0.0
`string`	period	`"1 minute"`
The period over which the specified statistic is applied. Valid values: `"1 minute"` - `"6 hours"` Since: 1.0.0

container_definitions

Container images containing inference code that are used when the model is deployed for predictions.

Since: 1.0.0

Type	Name	Default Value
`string`	image
The registry path where the inference code image is stored in Amazon ECR Since: 1.0.0
`string`	compression_type	`"CompressedModel"`
Specify the model compression type. Allowed Values: `CompressedModel` `UncompressedModel` Since: 1.0.0
`map(string)`	environment_variables	`{}`
Environment variables for the container Since: 1.0.0
`string`	model_data_location	`null`
The URL where model artifacts are stored in S3 Since: 1.0.0
`object(use_multiple_models)`	use_multiple_models	`null`
Configure this container to host multiple models Since: 1.0.0

data_capture_options

Specifies what data to capture.

Since: 1.0.0

Type	Name	Default Value
`bool`	prediction_request	`true`
Capture prediction requests (Input) Since: 1.0.0
`bool`	prediction_response	`true`
Capture prediction responses (Output) Since: 1.0.0

enable_data_capture

Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location

Since: 1.0.0

Type	Name	Default Value
`string`	s3_location_to_store_data_collected
Amazon SageMaker will save the prediction requests and responses along with metadata for your endpoint at this location. Since: 1.0.0
`number`	sampling_percentage	`30`
Amazon SageMaker will randomly sample and save the specified percentage of traffic to your endpoint. Since: 1.0.0
`object(capture_content_type)`	capture_content_type	`null`
The content type headers to capture. Must specify one of `csv_text` or `json` Since: 1.0.0
`object(data_capture_options)`	data_capture_options	`{}`
Specifies what data to capture. Since: 1.0.0

endpoints

Type	Name	Default Value
`map(string)`	additional_tags	`{}`
Additional tags for the endpoint Since: 1.0.0
`string`	encryption_key	`null`
Specify an existing KMS key's ARN to encrypt your response output in S3. Since: 1.0.0
`object(provisioned)`	provisioned	`null`
Creates a provisioned endpoint, mutually exclusive to `serverless`. Must specify one of `provisioned` or `serverless` Since: 1.0.0
`object(serverless)`	serverless	`null`
Creates a serverless endpoint, mutually exclusive to `provisioned`. Must specify one of `provisioned` or `serverless` Since: 1.0.0

inference_execution_config

Specifies details of how containers in a multi-container endpoint are called.

Since: 1.0.0

Type Name Default Value

string mode "Serial"

How containers in a multi-container are run.

Serial: containers run as a serial pipeline.
Direct: only the individual container that you specify is run.

Allowed Values:

Serial
Direct

Since: 1.0.0

models

Type	Name	Default Value
`string`	iam_role_arn
A role that SageMaker AI can assume to access model artifacts and docker images for deployment Since: 1.0.0
`map(object(container_definitions))`	container_definitions
Container images containing inference code that are used when the model is deployed for predictions. Since: 1.0.0
`object(inference_execution_config)`	inference_execution_config	`{}`
Specifies details of how containers in a multi-container endpoint are called. Since: 1.0.0
`map(string)`	additional_tags	`{}`
Additional tags for the model Since: 1.0.0
`bool`	enable_network_isolation	`false`
If enabled, containers cannot make any outbound network calls. Since: 1.0.0
`object(vpc_config)`	vpc_config	`null`
Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform. Since: 1.0.0

policies

Manages multiple auto scaling policies

Since: 1.0.0

Type	Name	Default Value
`string`	expression
The expression in `<metric_name> <statistic> = <TargetValue>` format. For example: `"Invocations average = 100"`. If using a predefined metric such as `SageMakerVariantInvocationsPerInstance`, you can omit `<statistic>` from the expression. For example: `"SageMakerVariantInvocationsPerInstance = 100"` Since: 1.0.0
`bool`	enable_scale_in	`true`
Allow this Auto Scaling policy to scale-in (removing EC2 instances). Since: 1.0.0
`string`	scale_in_cooldown_period	`"5 minutes"`
Specify the number of seconds to wait between scale-in actions. Since: 1.0.0
`string`	scale_out_cooldown_period	`"5 minutes"`
Specify the number of seconds to wait between scale-out actions. Since: 1.0.0

production_variants

Configure multiple production variants, one for each model that you want to host at this endpoint.

Since: 1.0.0

Type	Name	Default Value
`string`	instance_type
The EC2 instance type Since: 1.0.0
`string`	container_startup_timeout	`null`
The timeout value for the inference container to pass health check by SageMaker AI Hosting. Allowed Values: `1 minute` `1 hour` Since: 1.0.0
`number`	initial_instance_count	`1`
Specify the initial number of instances used for auto-scaling. Since: 1.0.0
`number`	initial_weight	`1`
Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. Since: 1.0.0
`string`	model_data_download_timeout	`null`
The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Allowed Values: `1 minute` `1 hour` Since: 1.0.0
`number`	volume_size	`null`
The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Allowed Values: `1` `512` Since: 1.0.0
`object(auto_scaling)`	auto_scaling	`null`
Enables auto scaling Since: 1.0.0
`map(object(cloudwatch_alarms))`	cloudwatch_alarms	`{}`
Configures multiple Cloudwatch alarms. Examples: Basic Usage Since: 1.0.0

provisioned

Creates a provisioned endpoint, mutually exclusive to serverless. Must specify one of provisioned or serverless

Since: 1.0.0

Type	Name	Default Value
`map(object(production_variants))`	production_variants
Configure multiple production variants, one for each model that you want to host at this endpoint. Since: 1.0.0
`object(async_invocation_config)`	async_invocation_config	`null`
Specifies configuration for how an endpoint performs asynchronous inference Since: 1.0.0
`object(enable_data_capture)`	enable_data_capture	`null`
Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location Since: 1.0.0
`map(object(shadow_variants))`	shadow_variants	`{}`
Specify shadow variants to receive production traffic replicated from the model specified on `production_variants`. If you use this field, you can only specify one variant for `production_variants` and one variant for `shadow_variants`. Since: 1.0.0

serverless

Creates a serverless endpoint, mutually exclusive to provisioned. Must specify one of provisioned or serverless

Since: 1.0.0

Type Name Default Value

object(variant) variant

Configures variant for this endpoint

Since: 1.0.0

shadow_variants

Specify shadow variants to receive production traffic replicated from the model specified on production_variants. If you use this field, you can only specify one variant for production_variants and one variant for shadow_variants.

Since: 1.0.0

Type	Name	Default Value
`string`	instance_type
The EC2 instance type Since: 1.0.0
`number`	container_startup_timeout	`null`
The timeout value for the inference container to pass health check by SageMaker AI Hosting. Valid values: `"1 minute"` - `"1 hour"` Since: 1.0.0
`number`	initial_instance_count	`1`
Specify the initial number of instances used for auto-scaling. Since: 1.0.0
`number`	initial_weight	`1`
Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. Since: 1.0.0
`number`	model_data_download_timeout	`null`
The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Valid values: `"1 minute"` - `"1 hour"`. Since: 1.0.0
`number`	volume_size	`null`
The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Valid values: `1` - `512`. Since: 1.0.0

use_multiple_models

Configure this container to host multiple models

Since: 1.0.0

Type Name Default Value

bool enable_model_caching true

Whether to cache models for a multi-model endpoint. By default, multi-model endpoints cache models so that a model does not have to be loaded into memory each time it is invoked. Some use cases do not benefit from model caching. For example, if an endpoint hosts a large number of models that are each invoked infrequently, the endpoint might perform better if you disable model caching.

Since: 1.0.0

variant

Configures variant for this endpoint

Since: 1.0.0

Type	Name	Default Value
`string`	model_name
The name of the model to be used for this endpoint. The model specified must be managed by the same module Since: 1.0.0
`number`	max_concurrency	`20`
The maximum number of concurrent invocations your serverless endpoint can process. Valid values: `1` - `200` Since: 1.0.0
`number`	memory_size	`1024`
The memory size of your serverless endpoint. Allowed Values: `1024` `2048` `3072` `4096` `5120` `6144` Since: 1.0.0
`number`	provisioned_concurrency	`null`
Provisioned concurrency enables you to deploy models on serverless endpoints with predictable performance and high scalability. For the set number of concurrent invocations, SageMaker will keep underlying compute warm and ready to respond instantaneously without cold starts. Must be `<= max_concurrency` Since: 1.0.0

vpc_config

Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform.

Since: 1.0.0

Type	Name	Default Value
`list(string)`	security_group_ids
List of security group IDs the models use to access private resources Since: 1.0.0
`list(string)`	subnet_ids
List of subnet IDs to be used for this VPC connection Since: 1.0.0

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
LICENSE.md		LICENSE.md
README.md		README.md
_common.tf		_common.tf
appautoscaling-policy.tf		appautoscaling-policy.tf
appautoscaling-target.tf		appautoscaling-target.tf
cloudwatch-metric-alarm.tf		cloudwatch-metric-alarm.tf
outputs.tf		outputs.tf
sagemaker-endpoint-configuration.tf		sagemaker-endpoint-configuration.tf
sagemaker-endpoint.tf		sagemaker-endpoint.tf
sagemaker-model.tf		sagemaker-model.tf
variables.tf		variables.tf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sagemaker Inference Module

Table of Contents

Example Usage

Basic Usage

Inputs

Required

Optional

Objects

async_invocation_config

auto_scaling

capture_content_type

cloudwatch_alarms

container_definitions

data_capture_options

enable_data_capture

endpoints

inference_execution_config

models

policies

production_variants

provisioned

serverless

shadow_variants

use_multiple_models

variant

vpc_config

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sagemaker Inference Module

Table of Contents

Example Usage

Basic Usage

Inputs

Required

Optional

Objects

async_invocation_config

auto_scaling

capture_content_type

cloudwatch_alarms

container_definitions

data_capture_options

enable_data_capture

endpoints

inference_execution_config

models

policies

production_variants

provisioned

serverless

shadow_variants

use_multiple_models

variant

vpc_config

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages