-
Notifications
You must be signed in to change notification settings - Fork 16.3k
Fix: Preserve TaskInstance history during Kubernetes API rate limiting errors - Task Instance Fix #55159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix: Preserve TaskInstance history during Kubernetes API rate limiting errors - Task Instance Fix #55159
Conversation
edbf605 to
f0ab406
Compare
|
Fix looks reasonable but tests don’t agree. This should include a test case too. |
f0ab406 to
01962e3
Compare
75dfd76 to
63a5ad1
Compare
When implementing unit tests for the new orphaned task detection logic in the
Original problem
Solution |
63a5ad1 to
e9dbba6
Compare
|
@HsiuChuanHsu this PR combines changes to airflow core and k8s provider. If these changes are not coupled can you please separate? Providers and core have different release cycles |
|
@eladkal Sure, will work on it. |
e9dbba6 to
7d78088
Compare
7d78088 to
94b9ea9
Compare
- Handle 429 errors in KubernetesExecutor task publishing retry logic - Detect orphaned tasks and record TaskInstanceHistory in failure handler - Add detailed logging for rate limiting scenarios
Move orphaned task detection before end_date assignment to ensure TaskInstanceHistory is recorded for tasks that become detached during scheduler restarts due to Kubernetes API 429 errors.
94b9ea9 to
d9b125d
Compare
Description
This PR fixes issue #49517 where TaskInstanceHistory records were lost when Kubernetes API rate limiting (429 errors) prevented task adoption during scheduler restarts.
Problem
When using KubernetesExecutor or CeleryKubernetesExecutor:
NoneRUNNINGSolution
KubernetesExecutor: Add 429 error handling to retry logic and detailed logging for adoption failuresTaskInstance: Detect orphaned tasks (
state=None+start_date set+end_date unset) and record TaskInstanceHistoryImpact
Before:
After:
Fixes: #49517
Related: #49244
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rstor{issue_number}.significant.rst, in airflow-core/newsfragments.