Create a cluster with more than 1 node. Then shut down at least a node in that node pool (gracefully).
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:58:54,560 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:58:54,917 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:58:54,917 - ERROR - Encountered unusual error. Trying again.
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:58:59,596 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:58:59,895 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:58:59,896 - ERROR - Encountered unusual error. Trying again.
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:59:04,592 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:04,949 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:04,949 - ERROR - Encountered unusual error. Trying again.
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:59:09,595 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:09,886 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:09,887 - ERROR - Encountered unusual error. Trying again.
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:59:14,619 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:14,966 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:14,966 - ERROR - Encountered unusual error. Trying again.
[gke-underscore-sky-burst-underscore-us-central1-underscore-test-cluster-v1 - k8 Manager] - 2024-06-16 15:59:19,644 - ERROR - Unexpected error: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:19,952 - ERROR - Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 195, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 32, in heartbeat_error_handler
yield
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 86, in run
self.controller_loop()
File "/home/alex/Documents/skyflow/skyflow/skylet/cluster_controller.py", line 93, in controller_loop
cluster_status = self.manager_api.get_cluster_status()
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 218, in get_cluster_status
allocatable_capacity=self.allocatable_resources,
File "/home/alex/Documents/skyflow/skyflow/cluster_manager/kubernetes/kubernetes_manager.py", line 295, in allocatable_resources
assert node_name in available_resources.keys(), (
AssertionError: Node gke-test-cluster-v1-cpu-pool-118cb08c-rhft not found in cluster resources.
[gke_sky-burst_us-central1_test-cluster-v1 - Cluster Controller] - 2024-06-16 15:59:19,952 - ERROR - Encountered unusual error. Trying again.
Create a cluster with more than 1 node. Then shut down at least a node in that node pool (gracefully).
Logs: