-
Notifications
You must be signed in to change notification settings - Fork 115
RFC: Custom backends and policies for Dynamic Selection #2220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| The `get_submission_group` is not easily implemented in a meaningful way. It must return a type that | ||
| defines a member function `wait`. But since there is no way to know how to wait on all previous submissions | ||
| for an arbitrary backend resource, this function will likely need to return a dummy type or | ||
| `get_submission_group` be undefined for the default backend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we could provide a way to have the user give us the wait type, do we have a shot? Or maybe it's better to leave this out to make the default case simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets consider if there's a meaningful default or not. I'm not sure that there is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a meaningful default is to call wait on all resources, if they have a wait function.
danhoeflinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks pretty good. Mostly just quite minor changes for now.
danhoeflinger
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass at feedback
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_policies.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
rfcs/proposed/dynamic_selection_customization/custom_backends.md
Outdated
Show resolved
Hide resolved
99152b4 to
80ad6b8
Compare
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
rfcs/experimental/dynamic_selection/customization/custom_backends.md
Outdated
Show resolved
Hide resolved
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
| - `b` an arbitrary identifier of type `T` | ||
| - `args` an arbitrary parameter pack of types `typename… Args` | ||
| - `s` is of type `S` and satisfies *Selection* and `is_same_v<resource_t<S>, resource_t<T>>` is `true` | ||
| - `f` a function object with signature `/*ret_type*/ fun(resource_t<T>, Args…);` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "fun" be "f"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed here and in another place, also fixed ret_type.
|
|
||
| ## Proposed Design to Enable Easier Customization of Backends | ||
|
|
||
| This proposal presents a flexible backend system based on a `backend_base` template class and a `core_resource_backend` template that can be used for most resource types. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` should exist for that resource. This is not possible to be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| This proposal presents a flexible backend system based on a `backend_base` template class and a `core_resource_backend` template that can be used for most resource types. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` should exist for that resource. This is not possible to be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible. | |
| This proposal presents a flexible backend system that can be used for most resource types. It is based on two template classes: `backend_base` and `core_resource_backend`. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` is necessary since reporting cannot be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
|
|
||
| ### Key Components | ||
|
|
||
| 1. **`backend_base<ResourceType>`**: A proposed base class template that implements core backend functionality to inherit from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| 1. **`backend_base<ResourceType>`**: A proposed base class template that implements core backend functionality to inherit from. | |
| 1. **`backend_base<ResourceType>`**: A proposed base class template that implements common backend functionality to inherit from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we have a "core_resource_backend" lets not overload the meaning of "core".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this and in other places for backends.
|
|
||
| ### Reporting Requirements and Scratch Space Contract | ||
|
|
||
| Backends must now explicitly accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Backends must now explicitly accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`). | |
| Backends must accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"now" and "explicitly" seem unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| - Compile-time checks: The backend implementation should `static_assert` if any type in `ReportingReqs...` is not a supported reporting requirement for that backend. This makes unsupported combinations a hard error at compile time. | ||
| - Resource filtering: Some reporting requirements imply properties of the underlying resource or device (for example, timing via `task_time_t` may require device support for profiling tags and queues created with profiling enabled). Backends must examine the provided resources (or query devices when default-initializing resources) and filter out any resources that do not support all requested reporting requirements. Any special resource properties required to implement a reporting requirement must be checked here (for instance, checking `device.has(sycl::aspect::ext_oneapi_queue_profiling_tag)` in addition to creating queues with `sycl::property::queue::enable_profiling()` when `task_time_t` is requested). If after filtering the set of candidate resources there are no resources left that satisfy all requested reporting requirements, the backend must throw a `std::runtime_error` documenting that the requested reporting requirements cannot be satisfied on the available resources. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"may require device support for profiling tags and queues created with profiling enabled" this is very SYCL specific. Profiling tags or profiling enabled has no generic meaning. Can you be more generic, for example "may limit to only devices that support profiling".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rewritten to be more generic
| }; | ||
| ``` | ||
|
|
||
| When a policy tracks execution information (like task timing or completion), the backend needs to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| When a policy tracks execution information (like task timing or completion), the backend needs to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead. | |
| When a policy tracks execution information (like task timing or completion), the backend may need to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
taken
| Returns the vector of resources stored during construction. Resources must be copyable/movable into a `std::vector` and remain valid throughout the backend's lifetime. | ||
| #### `get_submission_group()` Implementation | ||
| Returns a group object that waits on all resources. The `default_submission_group` attempts to call `wait()` on the `CoreResourceType` by applying `adapter` to the `ResourceType` object. The `CoreResourceType` must provide a `wait()` method that blocks until all work on that resource is complete. Note that the default implementation waits on each resource, not each submission (works for SYCL queues, oneTBB `task_group` objects, etc.). Adapters can enable types without a `wait()` method; for example, `[](auto pointer){ return *pointer; }` allows `sycl::queue*` to work by adapting to `sycl::queue`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not completely accurate. get_submission_group should wait for all outstanding submissions made via this backend instance. One way to accomplish that is to wait on all resources, such as all queues. But we shouldn't require that it waits on the resources, only that all submissions are done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignore this previous comment, I was reacting to the 1st sentence as if it described the purpose of get_submission_group. But in context, this falls under the default implementation section and given the text at the end of the paragraph, it's ok.
| Policies specify reporting requirements (`task_time_t`, `task_submission_t`, `task_completion_t` from `oneapi::dpl::experimental::execution_info`) via the `policy_base` constructor. These are passed to the backend constructor, which filters devices based on feature availability. Examples: `auto_tune_policy` requires `task_time_t`; `dynamic_load_policy` requires `task_submission_t` and `task_completion_t`. | ||
| Selection handles must include a `scratch_space` member of type `backend_traits<Backend>::template selection_scratch_t<reqs...>` to provide backend storage for instrumentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are selection handles described anywhere? For a policy that needs task_time_t, task_submission_t or task_completion_t reported its not clear how those values are received by the policy via the handle. I suppose the example demonstrates it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its a good point, I think its a term that is under described, its been on the edge of "implementation details" where "selection" is actually defined, but with the addition of selection_scratch_t, I think we need to define it more clearly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a selection handle section to custom_policies.md and then linked to it from custom_backends.md.
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
Signed-off-by: Dan Hoeflinger <[email protected]>
vossmjp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I can't click approve since I'm one of the authors, but I approve!
This PR describes changes to the experimental dynamic selection design to improve the tools to create custom backends and custom policies for dynamic selection.
Major Changes:
policy_baseto handle much of the boilerplate code for new custom policies, anddefault_backendto do the same for new custom backends.ResourceAdaptersupport provides the ability to serve different flavors of resource with the same backend. (sycl::queuevssycl::queue*)