Skip to content

FriendsOfTerraform/aws-sagemaker-inference

Repository files navigation

Sagemaker Inference Module

This module builds and configures SageMaker inference models and endpoints

This repository is a READ-ONLY sub-tree split. See https://github.com/FriendsOfTerraform/modules to create issues or submit pull requests.

Table of Contents

Example Usage

Basic Usage

module "basic_usage" {
  source = "github.com/FriendsOfTerraform/aws-sagemaker-inference.git?ref=v1.0.0"

  # manages multiple models
  models = {
    # The keys of the map are model names
    demo-model = {
      iam_role_arn = "arn:aws:iam::111122223333:role/service-role/AmazonSageMakerServiceCatalogProductsExecutionRole"

      # manages multiple container definitions
      container_definitions = {
        # the keys of the map are DNS name for the containers
        container1 = {
          image               = "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.19.0"
          model_data_location = "s3://demo-bucket/demo-model.tar.gz"
        }
      }
    }
  }

  # manages multiple endpoints
  endpoints = {
    # the keys of the map are endpoint names
    realtime-endpoint = {
      provisioned = {
        production_variants = {
          # must refer to models created by this module
          demo-model = {
            instance_type  = "ml.m5.large"

            auto_scaling = {
              policies = {
                # the keys of the map are policy names
                builtin-policy            = { expression = "SageMakerVariantInvocationsPerInstance = 1000" }
                keep-invocations-near-100 = { expression = "Invocations average = 100" }
              }
            }

            cloudwatch_alarms = {
              # the keys of the map are alarm names
              invocations-greater-than-1000         = { expression = "Invocations average > 1000" }
              invocation-5xx-errors-greater-than-10 = { expression = "Invocation5XXErrors average >= 10" }
            }
          }
        }
      }
    }
  }
}

Inputs

Required

TypeNameDefault Value
map(object(models)) models

Deploy multiple models.

Examples:

Since: 1.0.0

Optional

TypeNameDefault Value
map(string) additional_tags_all {}

Additional tags for all resources deployed with this module

Since: 1.0.0

map(object(endpoints)) endpoints {}

Configures multiple endpoints

Since: 1.0.0

Objects

async_invocation_config

Specifies configuration for how an endpoint performs asynchronous inference

Since: 1.0.0

TypeNameDefault Value
string s3_output_path

Location to upload response output on success. Must be an S3 url(s3 path)

Since: 1.0.0

string encryption_key null

Specify an existing KMS key's ARN to encrypt your response output in S3.

Since: 1.0.0

string error_notification_location null

SNS topic to post a notification when inference fails. If no topic is provided, no notification is sent

Since: 1.0.0

number max_concurrent_invocations_per_instance null

The maximum number concurrent requests sent to model container. If no value is provided, SageMaker chooses an optimal value.

Since: 1.0.0

string s3_failure_path null

Location to upload response output on failure. Must be an S3 url (s3 path).

Since: 1.0.0

string success_notification_location null

SNS topic to post a notification when inference completes successfully. If no topic is provided, no notification is sent

Since: 1.0.0

auto_scaling

Enables auto scaling

Since: 1.0.0

TypeNameDefault Value
map(object(policies)) policies

Manages multiple auto scaling policies

Since: 1.0.0

number maximum_capacity 1

Specify the maximum number of EC2 instances to maintain.

Since: 1.0.0

number minimum_capacity 1

Specify the minimum number of EC2 instances to maintain.

Since: 1.0.0

capture_content_type

The content type headers to capture. Must specify one of csv_text or json

Since: 1.0.0

TypeNameDefault Value
list(string) csv_text null

The CSV content type headers to capture.

Since: 1.0.0

list(string) json null

The JSON content type headers to capture.

Since: 1.0.0

cloudwatch_alarms

Configures multiple Cloudwatch alarms.

Examples:

Since: 1.0.0

TypeNameDefault Value
string expression

The expression in <metric_name> <statistic> <comparison_operator> <threshold> format. For example: "Invocations average >= 100"

Since: 1.0.0

string description null

The description of the alarm

Since: 1.0.0

number evaluation_periods 1

The number of periods over which data is compared to the specified threshold.

Since: 1.0.0

string notification_sns_topic null

The SNS topic where notification will be sent

Since: 1.0.0

string period "1 minute"

The period over which the specified statistic is applied. Valid values: "1 minute" - "6 hours"

Since: 1.0.0

container_definitions

Container images containing inference code that are used when the model is deployed for predictions.

Since: 1.0.0

TypeNameDefault Value
string image

The registry path where the inference code image is stored in Amazon ECR

Since: 1.0.0

string compression_type "CompressedModel"

Specify the model compression type.

Allowed Values:

  • CompressedModel
  • UncompressedModel

Since: 1.0.0

map(string) environment_variables {}

Environment variables for the container

Since: 1.0.0

string model_data_location null

The URL where model artifacts are stored in S3

Since: 1.0.0

object(use_multiple_models) use_multiple_models null

Configure this container to host multiple models

Since: 1.0.0

data_capture_options

Specifies what data to capture.

Since: 1.0.0

TypeNameDefault Value
bool prediction_request true

Capture prediction requests (Input)

Since: 1.0.0

bool prediction_response true

Capture prediction responses (Output)

Since: 1.0.0

enable_data_capture

Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location

Since: 1.0.0

TypeNameDefault Value
string s3_location_to_store_data_collected

Amazon SageMaker will save the prediction requests and responses along with metadata for your endpoint at this location.

Since: 1.0.0

number sampling_percentage 30

Amazon SageMaker will randomly sample and save the specified percentage of traffic to your endpoint.

Since: 1.0.0

object(capture_content_type) capture_content_type null

The content type headers to capture. Must specify one of csv_text or json

Since: 1.0.0

object(data_capture_options) data_capture_options {}

Specifies what data to capture.

Since: 1.0.0

endpoints

TypeNameDefault Value
map(string) additional_tags {}

Additional tags for the endpoint

Since: 1.0.0

string encryption_key null

Specify an existing KMS key's ARN to encrypt your response output in S3.

Since: 1.0.0

object(provisioned) provisioned null

Creates a provisioned endpoint, mutually exclusive to serverless. Must specify one of provisioned or serverless

Since: 1.0.0

object(serverless) serverless null

Creates a serverless endpoint, mutually exclusive to provisioned. Must specify one of provisioned or serverless

Since: 1.0.0

inference_execution_config

Specifies details of how containers in a multi-container endpoint are called.

Since: 1.0.0

TypeNameDefault Value
string mode "Serial"

How containers in a multi-container are run.

  • Serial: containers run as a serial pipeline.
  • Direct: only the individual container that you specify is run.

Allowed Values:

  • Serial
  • Direct

Since: 1.0.0

models

TypeNameDefault Value
string iam_role_arn

A role that SageMaker AI can assume to access model artifacts and docker images for deployment

Since: 1.0.0

map(object(container_definitions)) container_definitions

Container images containing inference code that are used when the model is deployed for predictions.

Since: 1.0.0

object(inference_execution_config) inference_execution_config {}

Specifies details of how containers in a multi-container endpoint are called.

Since: 1.0.0

map(string) additional_tags {}

Additional tags for the model

Since: 1.0.0

bool enable_network_isolation false

If enabled, containers cannot make any outbound network calls.

Since: 1.0.0

object(vpc_config) vpc_config null

Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform.

Since: 1.0.0

policies

Manages multiple auto scaling policies

Since: 1.0.0

TypeNameDefault Value
string expression

The expression in <metric_name> <statistic> = <TargetValue> format. For example: "Invocations average = 100". If using a predefined metric such as SageMakerVariantInvocationsPerInstance, you can omit <statistic> from the expression. For example: "SageMakerVariantInvocationsPerInstance = 100"

Since: 1.0.0

bool enable_scale_in true

Allow this Auto Scaling policy to scale-in (removing EC2 instances).

Since: 1.0.0

string scale_in_cooldown_period "5 minutes"

Specify the number of seconds to wait between scale-in actions.

Since: 1.0.0

string scale_out_cooldown_period "5 minutes"

Specify the number of seconds to wait between scale-out actions.

Since: 1.0.0

production_variants

Configure multiple production variants, one for each model that you want to host at this endpoint.

Since: 1.0.0

TypeNameDefault Value
string instance_type

The EC2 instance type

Since: 1.0.0

string container_startup_timeout null

The timeout value for the inference container to pass health check by SageMaker AI Hosting.

Allowed Values:

  • 1 minute
  • 1 hour

Since: 1.0.0

number initial_instance_count 1

Specify the initial number of instances used for auto-scaling.

Since: 1.0.0

number initial_weight 1

Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.

Since: 1.0.0

string model_data_download_timeout null

The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant.

Allowed Values:

  • 1 minute
  • 1 hour

Since: 1.0.0

number volume_size null

The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant.

Allowed Values:

  • 1
  • 512

Since: 1.0.0

object(auto_scaling) auto_scaling null

Enables auto scaling

Since: 1.0.0

map(object(cloudwatch_alarms)) cloudwatch_alarms {}

Configures multiple Cloudwatch alarms.

Examples:

Since: 1.0.0

provisioned

Creates a provisioned endpoint, mutually exclusive to serverless. Must specify one of provisioned or serverless

Since: 1.0.0

TypeNameDefault Value
map(object(production_variants)) production_variants

Configure multiple production variants, one for each model that you want to host at this endpoint.

Since: 1.0.0

object(async_invocation_config) async_invocation_config null

Specifies configuration for how an endpoint performs asynchronous inference

Since: 1.0.0

object(enable_data_capture) enable_data_capture null

Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location

Since: 1.0.0

map(object(shadow_variants)) shadow_variants {}

Specify shadow variants to receive production traffic replicated from the model specified on production_variants. If you use this field, you can only specify one variant for production_variants and one variant for shadow_variants.

Since: 1.0.0

serverless

Creates a serverless endpoint, mutually exclusive to provisioned. Must specify one of provisioned or serverless

Since: 1.0.0

TypeNameDefault Value
object(variant) variant

Configures variant for this endpoint

Since: 1.0.0

shadow_variants

Specify shadow variants to receive production traffic replicated from the model specified on production_variants. If you use this field, you can only specify one variant for production_variants and one variant for shadow_variants.

Since: 1.0.0

TypeNameDefault Value
string instance_type

The EC2 instance type

Since: 1.0.0

number container_startup_timeout null

The timeout value for the inference container to pass health check by SageMaker AI Hosting. Valid values: "1 minute" - "1 hour"

Since: 1.0.0

number initial_instance_count 1

Specify the initial number of instances used for auto-scaling.

Since: 1.0.0

number initial_weight 1

Determines initial traffic distribution among all of the models that you specify in the endpoint configuration.

Since: 1.0.0

number model_data_download_timeout null

The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Valid values: "1 minute" - "1 hour".

Since: 1.0.0

number volume_size null

The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Valid values: 1 - 512.

Since: 1.0.0

use_multiple_models

Configure this container to host multiple models

Since: 1.0.0

TypeNameDefault Value
bool enable_model_caching true

Whether to cache models for a multi-model endpoint. By default, multi-model endpoints cache models so that a model does not have to be loaded into memory each time it is invoked. Some use cases do not benefit from model caching. For example, if an endpoint hosts a large number of models that are each invoked infrequently, the endpoint might perform better if you disable model caching.

Since: 1.0.0

variant

Configures variant for this endpoint

Since: 1.0.0

TypeNameDefault Value
string model_name

The name of the model to be used for this endpoint. The model specified must be managed by the same module

Since: 1.0.0

number max_concurrency 20

The maximum number of concurrent invocations your serverless endpoint can process. Valid values: 1 - 200

Since: 1.0.0

number memory_size 1024

The memory size of your serverless endpoint.

Allowed Values:

  • 1024
  • 2048
  • 3072
  • 4096
  • 5120
  • 6144

Since: 1.0.0

number provisioned_concurrency null

Provisioned concurrency enables you to deploy models on serverless endpoints with predictable performance and high scalability. For the set number of concurrent invocations, SageMaker will keep underlying compute warm and ready to respond instantaneously without cold starts. Must be <= max_concurrency

Since: 1.0.0

vpc_config

Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform.

Since: 1.0.0

TypeNameDefault Value
list(string) security_group_ids

List of security group IDs the models use to access private resources

Since: 1.0.0

list(string) subnet_ids

List of subnet IDs to be used for this VPC connection

Since: 1.0.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages