This module builds and configures SageMaker inference models and endpoints
This repository is a READ-ONLY sub-tree split. See https://github.com/FriendsOfTerraform/modules to create issues or submit pull requests.
module "basic_usage" {
source = "github.com/FriendsOfTerraform/aws-sagemaker-inference.git?ref=v1.0.0"
# manages multiple models
models = {
# The keys of the map are model names
demo-model = {
iam_role_arn = "arn:aws:iam::111122223333:role/service-role/AmazonSageMakerServiceCatalogProductsExecutionRole"
# manages multiple container definitions
container_definitions = {
# the keys of the map are DNS name for the containers
container1 = {
image = "763104351884.dkr.ecr.us-east-1.amazonaws.com/tensorflow-inference:2.19.0"
model_data_location = "s3://demo-bucket/demo-model.tar.gz"
}
}
}
}
# manages multiple endpoints
endpoints = {
# the keys of the map are endpoint names
realtime-endpoint = {
provisioned = {
production_variants = {
# must refer to models created by this module
demo-model = {
instance_type = "ml.m5.large"
auto_scaling = {
policies = {
# the keys of the map are policy names
builtin-policy = { expression = "SageMakerVariantInvocationsPerInstance = 1000" }
keep-invocations-near-100 = { expression = "Invocations average = 100" }
}
}
cloudwatch_alarms = {
# the keys of the map are alarm names
invocations-greater-than-1000 = { expression = "Invocations average > 1000" }
invocation-5xx-errors-greater-than-10 = { expression = "Invocation5XXErrors average >= 10" }
}
}
}
}
}
}
}| Type | Name | Default Value |
|---|---|---|
map(object(models)) |
models | |
|
Deploy multiple models. Examples: Since: 1.0.0 | ||
| Type | Name | Default Value |
|---|---|---|
map(string) |
additional_tags_all | {} |
|
Additional tags for all resources deployed with this module Since: 1.0.0 | ||
map(object(endpoints)) |
endpoints | {} |
|
Configures multiple endpoints Since: 1.0.0 | ||
Specifies configuration for how an endpoint performs asynchronous inference
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
s3_output_path | |
|
Location to upload response output on success. Must be an S3 url(s3 path) Since: 1.0.0 | ||
string |
encryption_key | null |
|
Specify an existing KMS key's ARN to encrypt your response output in S3. Since: 1.0.0 | ||
string |
error_notification_location | null |
|
SNS topic to post a notification when inference fails. If no topic is provided, no notification is sent Since: 1.0.0 | ||
number |
max_concurrent_invocations_per_instance | null |
|
The maximum number concurrent requests sent to model container. If no value is provided, SageMaker chooses an optimal value. Since: 1.0.0 | ||
string |
s3_failure_path | null |
|
Location to upload response output on failure. Must be an S3 url (s3 path). Since: 1.0.0 | ||
string |
success_notification_location | null |
|
SNS topic to post a notification when inference completes successfully. If no topic is provided, no notification is sent Since: 1.0.0 | ||
Enables auto scaling
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
map(object(policies)) |
policies | |
|
Manages multiple auto scaling policies Since: 1.0.0 | ||
number |
maximum_capacity | 1 |
|
Specify the maximum number of EC2 instances to maintain. Since: 1.0.0 | ||
number |
minimum_capacity | 1 |
|
Specify the minimum number of EC2 instances to maintain. Since: 1.0.0 | ||
The content type headers to capture. Must specify one of csv_text or json
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
list(string) |
csv_text | null |
|
The CSV content type headers to capture. Since: 1.0.0 | ||
list(string) |
json | null |
|
The JSON content type headers to capture. Since: 1.0.0 | ||
Configures multiple Cloudwatch alarms.
Examples:
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
expression | |
|
The expression in Since: 1.0.0 | ||
string |
description | null |
|
The description of the alarm Since: 1.0.0 | ||
number |
evaluation_periods | 1 |
|
The number of periods over which data is compared to the specified threshold. Since: 1.0.0 | ||
string |
notification_sns_topic | null |
|
The SNS topic where notification will be sent Since: 1.0.0 | ||
string |
period | "1 minute" |
|
The period over which the specified statistic is applied. Valid values: Since: 1.0.0 | ||
Container images containing inference code that are used when the model is deployed for predictions.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
image | |
|
The registry path where the inference code image is stored in Amazon ECR Since: 1.0.0 | ||
string |
compression_type | "CompressedModel" |
|
Specify the model compression type. Allowed Values:
Since: 1.0.0 | ||
map(string) |
environment_variables | {} |
|
Environment variables for the container Since: 1.0.0 | ||
string |
model_data_location | null |
|
The URL where model artifacts are stored in S3 Since: 1.0.0 | ||
object(use_multiple_models) |
use_multiple_models | null |
|
Configure this container to host multiple models Since: 1.0.0 | ||
Specifies what data to capture.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
bool |
prediction_request | true |
|
Capture prediction requests (Input) Since: 1.0.0 | ||
bool |
prediction_response | true |
|
Capture prediction responses (Output) Since: 1.0.0 | ||
Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
s3_location_to_store_data_collected | |
|
Amazon SageMaker will save the prediction requests and responses along with metadata for your endpoint at this location. Since: 1.0.0 | ||
number |
sampling_percentage | 30 |
|
Amazon SageMaker will randomly sample and save the specified percentage of traffic to your endpoint. Since: 1.0.0 | ||
object(capture_content_type) |
capture_content_type | null |
|
The content type headers to capture. Must specify one of Since: 1.0.0 | ||
object(data_capture_options) |
data_capture_options | {} |
|
Specifies what data to capture. Since: 1.0.0 | ||
| Type | Name | Default Value |
|---|---|---|
map(string) |
additional_tags | {} |
|
Additional tags for the endpoint Since: 1.0.0 | ||
string |
encryption_key | null |
|
Specify an existing KMS key's ARN to encrypt your response output in S3. Since: 1.0.0 | ||
object(provisioned) |
provisioned | null |
|
Creates a provisioned endpoint, mutually exclusive to Since: 1.0.0 | ||
object(serverless) |
serverless | null |
|
Creates a serverless endpoint, mutually exclusive to Since: 1.0.0 | ||
Specifies details of how containers in a multi-container endpoint are called.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
mode | "Serial" |
|
How containers in a multi-container are run.
Allowed Values:
Since: 1.0.0 | ||
| Type | Name | Default Value |
|---|---|---|
string |
iam_role_arn | |
|
A role that SageMaker AI can assume to access model artifacts and docker images for deployment Since: 1.0.0 | ||
map(object(container_definitions)) |
container_definitions | |
|
Container images containing inference code that are used when the model is deployed for predictions. Since: 1.0.0 | ||
object(inference_execution_config) |
inference_execution_config | {} |
|
Specifies details of how containers in a multi-container endpoint are called. Since: 1.0.0 | ||
map(string) |
additional_tags | {} |
|
Additional tags for the model Since: 1.0.0 | ||
bool |
enable_network_isolation | false |
|
If enabled, containers cannot make any outbound network calls. Since: 1.0.0 | ||
object(vpc_config) |
vpc_config | null |
|
Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform. Since: 1.0.0 | ||
Manages multiple auto scaling policies
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
expression | |
|
The expression in Since: 1.0.0 | ||
bool |
enable_scale_in | true |
|
Allow this Auto Scaling policy to scale-in (removing EC2 instances). Since: 1.0.0 | ||
string |
scale_in_cooldown_period | "5 minutes" |
|
Specify the number of seconds to wait between scale-in actions. Since: 1.0.0 | ||
string |
scale_out_cooldown_period | "5 minutes" |
|
Specify the number of seconds to wait between scale-out actions. Since: 1.0.0 | ||
Configure multiple production variants, one for each model that you want to host at this endpoint.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
instance_type | |
|
The EC2 instance type Since: 1.0.0 | ||
string |
container_startup_timeout | null |
|
The timeout value for the inference container to pass health check by SageMaker AI Hosting. Allowed Values:
Since: 1.0.0 | ||
number |
initial_instance_count | 1 |
|
Specify the initial number of instances used for auto-scaling. Since: 1.0.0 | ||
number |
initial_weight | 1 |
|
Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. Since: 1.0.0 | ||
string |
model_data_download_timeout | null |
|
The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Allowed Values:
Since: 1.0.0 | ||
number |
volume_size | null |
|
The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Allowed Values:
Since: 1.0.0 | ||
object(auto_scaling) |
auto_scaling | null |
|
Enables auto scaling Since: 1.0.0 | ||
map(object(cloudwatch_alarms)) |
cloudwatch_alarms | {} |
|
Configures multiple Cloudwatch alarms. Examples: Since: 1.0.0 | ||
Creates a provisioned endpoint, mutually exclusive to serverless. Must specify one of provisioned or serverless
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
map(object(production_variants)) |
production_variants | |
|
Configure multiple production variants, one for each model that you want to host at this endpoint. Since: 1.0.0 | ||
object(async_invocation_config) |
async_invocation_config | null |
|
Specifies configuration for how an endpoint performs asynchronous inference Since: 1.0.0 | ||
object(enable_data_capture) |
enable_data_capture | null |
|
Enables data capture, where SageMaker can save prediction request and prediction response information from your endpoint to a specified location Since: 1.0.0 | ||
map(object(shadow_variants)) |
shadow_variants | {} |
|
Specify shadow variants to receive production traffic replicated from the model specified on Since: 1.0.0 | ||
Creates a serverless endpoint, mutually exclusive to provisioned. Must specify one of provisioned or serverless
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
object(variant) |
variant | |
|
Configures variant for this endpoint Since: 1.0.0 | ||
Specify shadow variants to receive production traffic replicated from the model specified on production_variants. If you use this field, you can only specify one variant for production_variants and one variant for shadow_variants.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
instance_type | |
|
The EC2 instance type Since: 1.0.0 | ||
number |
container_startup_timeout | null |
|
The timeout value for the inference container to pass health check by SageMaker AI Hosting. Valid values: Since: 1.0.0 | ||
number |
initial_instance_count | 1 |
|
Specify the initial number of instances used for auto-scaling. Since: 1.0.0 | ||
number |
initial_weight | 1 |
|
Determines initial traffic distribution among all of the models that you specify in the endpoint configuration. Since: 1.0.0 | ||
number |
model_data_download_timeout | null |
|
The timeout value to download and extract the model that you want to host from Amazon S3 to the individual inference instance associated with this production variant. Valid values: Since: 1.0.0 | ||
number |
volume_size | null |
|
The size, in GB, of the ML storage volume attached to individual inference instance associated with the production variant. Valid values: Since: 1.0.0 | ||
Configure this container to host multiple models
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
bool |
enable_model_caching | true |
|
Whether to cache models for a multi-model endpoint. By default, multi-model endpoints cache models so that a model does not have to be loaded into memory each time it is invoked. Some use cases do not benefit from model caching. For example, if an endpoint hosts a large number of models that are each invoked infrequently, the endpoint might perform better if you disable model caching. Since: 1.0.0 | ||
Configures variant for this endpoint
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
string |
model_name | |
|
The name of the model to be used for this endpoint. The model specified must be managed by the same module Since: 1.0.0 | ||
number |
max_concurrency | 20 |
|
The maximum number of concurrent invocations your serverless endpoint can process. Valid values: Since: 1.0.0 | ||
number |
memory_size | 1024 |
|
The memory size of your serverless endpoint. Allowed Values:
Since: 1.0.0 | ||
number |
provisioned_concurrency | null |
|
Provisioned concurrency enables you to deploy models on serverless endpoints with predictable performance and high scalability. For the set number of concurrent invocations, SageMaker will keep underlying compute warm and ready to respond instantaneously without cold starts. Must be Since: 1.0.0 | ||
Specifies the VPC that you want your model to connect to. This is used in hosting services and in batch transform.
Since: 1.0.0
| Type | Name | Default Value |
|---|---|---|
list(string) |
security_group_ids | |
|
List of security group IDs the models use to access private resources Since: 1.0.0 | ||
list(string) |
subnet_ids | |
|
List of subnet IDs to be used for this VPC connection Since: 1.0.0 | ||