Skip to content

Make EcsCredentialProvider retry behavior configurable#3279

Open
kieranbrown wants to merge 2 commits intoaws:masterfrom
kieranbrown:retry-ecs-credentials-on-429
Open

Make EcsCredentialProvider retry behavior configurable#3279
kieranbrown wants to merge 2 commits intoaws:masterfrom
kieranbrown:retry-ecs-credentials-on-429

Conversation

@kieranbrown
Copy link
Copy Markdown

@kieranbrown kieranbrown commented Apr 25, 2026

Description of changes

Replaces the hardcoded retry condition in EcsCredentialProvider with two configurable allowlists, addressing review feedback that retry behavior for HTTP/container credential providers is largely undefined across AWS SDKs and shouldn't be expanded by default.

Two new constructor options:

  • retryable_exceptions — array of exception class names. Defaults to [ConnectException::class], preserving existing behavior.
  • retryable_error_codes — array of HTTP status codes to retry when returned in a Guzzle BadResponseException. Defaults to [].

The retry decision now lives in a private isRetryable() helper that consults both lists.

Motivation

The EKS Pod Identity Agent's rate limiter returns HTTP 429 when its token bucket is empty. Previously these surfaced immediately as CredentialsException. Callers in that environment can now opt in:

new EcsCredentialProvider([
    'retryable_error_codes' => [429],
]);

This mirrors the explicit allowlist used by aws-sdk-go-v2's endpointcreds client, without changing default behavior for any existing caller.

Tests

Added coverage for: opt-in 429 retry succeeds, opt-in 429 retry exhausts attempts, custom exception class added via config, custom config replacing defaults, 429 not retried by default. Existing ConnectException retry and 401 non-retryable cases unchanged.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


Generated by AI tools (Claude), and reviewed by Kieran Brown.

The EKS Pod Identity Agent rate-limits the credentials endpoint with a
token-bucket limiter and returns 429 when the bucket is empty. The
previous retry check only matched ConnectException, so HTTP-level
throttling responses bubbled up immediately and crashed the client.

Treat a Guzzle BadResponseException whose status is 429 as retryable in
addition to ConnectException, mirroring the explicit allowlist used by
aws-sdk-go-v2's endpointcreds client. Other 4xx responses (e.g. 401)
remain non-retryable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@stobrien89
Copy link
Copy Markdown
Member

stobrien89 commented May 2, 2026

Hi @kieranbrown,

Thanks for the contribution. We're hesitant to make this default behavior, only because HTTP/Container credential provider retry behavior is largely undefined across AWS SDKs. We'd be open to something like a configurable 'retryable_exceptions' and/or 'retryable_error_codes' in $config that gets passed at construction and set as properties which are checked where $isRetryable is currently set. I think this would warrant a private isRetryable() helper method too.

If you're open to implementing something like that, we'd accept changes as long as they come with sufficient test coverage. If not, feel free to open a feature request in our issues section and we could implement this when able.

Add retryable_exceptions and retryable_error_codes options to the
constructor config, replacing the hardcoded ConnectException + 429
check with a private isRetryable() helper that consults both lists.

retryable_exceptions defaults to [ConnectException::class], preserving
existing behavior. retryable_error_codes defaults to [], so HTTP 429
responses are no longer retried by default — callers running against
the EKS Pod Identity Agent opt in via ['retryable_error_codes' => [429]].

Addresses review feedback on aws#3279 — the SDK has not
previously specified retry behavior for HTTP/container credential
providers, so making 429 retry default would set a precedent the
maintainers want to avoid.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@kieranbrown kieranbrown force-pushed the retry-ecs-credentials-on-429 branch from f5e6fe6 to 9be2775 Compare May 3, 2026 20:27
@kieranbrown kieranbrown changed the title Retry EcsCredentialProvider on HTTP 429 responses Make EcsCredentialProvider retry behavior configurable May 3, 2026
@kieranbrown
Copy link
Copy Markdown
Author

Hey @stobrien89, totally understandable, I've made the changes you've requested

Copy link
Copy Markdown
Member

@stobrien89 stobrien89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good- just a suggestion and question

private $attempts;

/** @var string[] */
private $retryableExceptions;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have a default value of [ConnectException::class] and any exceptions passed in retryable_exceptions should be added to it. Open to alternatives if your use case requires the current pattern

}
}

if ($reason instanceof BadResponseException
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is RequestException too broad?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants