-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Problem
InstancesService.create blocks for up to 180 seconds by default while waiting for an instance to enter the provisioning state. While the timeout can be adjusted, the wait cannot be disabled. This behavior causes the following issues in dstack:
-
Instance loss
We have encountered multiple cases where our application lost track of an instance due to an exception raised during the wait. This can happen, for example, when:- the instance takes longer than 180 seconds to enter the
provisioningstate, or - a network error occurs during one of the
get_by_idcalls.
Because these exceptions are raised before
InstancesService.createreturns the instance ID, our application has no opportunity to persist the ID. As a result, the instance continues running and accruing charges, but is no longer tracked by the application. - the instance takes longer than 180 seconds to enter the
-
Performance
Our application architecture is optimized for short-lived transactions. Blocking for up to 180 seconds can lead to performance issues, such as database connection pool exhaustion.
Solution
Allow instance creation without blocking.
Ideally, the instance creation method should return immediately once the instance ID is known, without performing any subsequent API calls. This would allow us to persist the instance ID reliably and manage instance state polling using our own implementation.