Skip to content

S3 provider for static asset placement #281

@ArthurCRodrigues

Description

@ArthurCRodrigues

S3 Provider for Static Asset Placement

Context

Issue #280 introduces static asset placement for sandboxed assignments with a provider abstraction and an MVP local filesystem provider.

This issue defines the next step: an S3-backed provider for source resolution.

Goal

Add an S3 asset provider so root-level setup assets can be resolved from S3 and injected into sandbox target paths before execution.

Proposed Contract

Keep assignment-facing assets contract stable and provider-agnostic:

{
  "assets": [
    {
      "source": "s3://autograder-assets/datasets/tp2/RESTAURANTES.CSV",
      "target": "/tmp/RESTAURANTES.CSV",
      "read_only": true
    }
  ]
}

Parsing rules

  • source supports:
    • s3://<bucket>/<key>
    • optional future alias scheme like asset://... (not required in first S3 iteration)
  • Keep existing local source support unchanged.

Implementation Plan

1) Provider abstraction extension

  • Extend AssetSourceResolver with an S3 provider implementation (for example S3AssetProvider).
  • Choose provider by URI scheme in source.

2) S3 client integration

  • Use AWS SDK (boto3) with default credential chain (IAM role, env vars, shared creds).
  • Resolve object by bucket/key and stream bytes for injection.
  • Capture metadata needed for observability (size, etag/version_id when available).

3) Security controls

  • Optional bucket allowlist (ASSET_S3_ALLOWED_BUCKETS).
  • Optional key prefix allowlist per environment (ASSET_S3_ALLOWED_PREFIXES).
  • Reject unsupported schemes and malformed URIs.
  • Enforce max size limits before full download when possible (using head_object).
  • No credentials passed into sandbox containers.

4) Reliability and performance

  • Add bounded retries with backoff for transient S3 errors.
  • Distinguish retryable vs non-retryable failures.
  • Keep deterministic pre-flight failure messages for missing objects, access denied, timeouts.
  • Optional (future) host-side cache by bucket/key/version to reduce repeated downloads.

5) Pipeline integration

6) Config and ops

  • Document required IAM permissions:
    • s3:GetObject on allowed asset prefixes
    • s3:ListBucket only if needed
    • s3:GetObjectVersion if versioned objects are used
  • Add environment variables and defaults for:
    • request timeout
    • max asset size
    • allowed buckets/prefixes
  • Provide deployment notes for staging/prod credential setup.

7) Tests

  • Unit tests for URI parsing and provider selection.
  • Unit tests for S3 provider behavior (mocked boto3):
    • success path
    • object not found
    • access denied
    • timeout/retry handling
    • size limit rejection
  • Integration tests (if infra allows):
    • path using test bucket/object
    • end-to-end pipeline with S3-sourced asset consumed by student code.

8) Documentation

  • Update setup-config docs with S3 examples.
  • Add operational runbook for IAM and bucket layout.
  • Add troubleshooting section for common S3 failures.

Acceptance Criteria

  • source: s3://bucket/key is supported for global root-level assets.
  • Assets are fetched from S3 and injected before setup commands/grading.
  • Failures are explicit and fail pre-flight safely.
  • Security constraints (allowlists/limits) are enforced.
  • Tests and docs are updated.

Dependency

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions