RFC: Custom backends and policies for Dynamic Selection #2220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

vossmjp wants to merge 85 commits into main from dev/vossmjp/ds_custom_backends

Contributor

vossmjp commented May 1, 2025 •

edited by danhoeflinger

Loading

This PR describes changes to the experimental dynamic selection design to improve the tools to create custom backends and custom policies for dynamic selection.

Major Changes:

addition of policy_base to handle much of the boilerplate code for new custom policies, and default_backend to do the same for new custom backends.
addition of ResourceAdapter support provides the ability to serve different flavors of resource with the same backend. (sycl::queue vs sycl::queue*)

vossmjp requested review from danhoeflinger and egfefey

May 1, 2025 21:16

egfefey reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

egfefey reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

egfefey reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

egfefey reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

egfefey reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated

    
              The `get_submission_group` is not easily implemented in a meaningful way. It must return a type that

              defines a member function `wait`. But since there is no way to know how to wait on all previous submissions

              for an arbitrary backend resource, this function will likely need to return a dummy type or 

              `get_submission_group` be undefined for the default backend.

Contributor

egfefey May 2, 2025

If we could provide a way to have the user give us the wait type, do we have a shot? Or maybe it's better to leave this out to make the default case simple.

Contributor Author

vossmjp May 2, 2025

Lets consider if there's a meaningful default or not. I'm not sure that there is.

Contributor Author

vossmjp May 5, 2025

I think a meaningful default is to call wait on all resources, if they have a wait function.

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

akukanov added the RFC label

danhoeflinger reviewed

View reviewed changes

Contributor

danhoeflinger left a comment

I think this looks pretty good. Mostly just quite minor changes for now.

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/README.md Outdated Show resolved Hide resolved

vossmjp requested review from danhoeflinger and egfefey

August 15, 2025 21:00

danhoeflinger reviewed

View reviewed changes

Contributor

danhoeflinger left a comment

first pass at feedback

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/proposed/dynamic_selection_customization/custom_policies.md Outdated Show resolved Hide resolved

vossmjp changed the title ~~WIP: Custom backends for Dynamic Selection~~ Custom backends and policies for Dynamic Selection

vossmjp changed the title ~~Custom backends and policies for Dynamic Selection~~ RFC: Custom backends and policies for Dynamic Selection

vossmjp marked this pull request as ready for review

August 18, 2025 21:37

vossmjp requested review from akukanov, danhoeflinger and rarutyun

August 18, 2025 21:37

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/proposed/dynamic_selection_customization/custom_backends.md Outdated Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Show resolved Hide resolved

danhoeflinger reviewed

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_policies.md Show resolved Hide resolved

danhoeflinger force-pushed the dev/vossmjp/ds_custom_backends branch from 99152b4 to 80ad6b8 Compare

October 29, 2025 19:39

danhoeflinger mentioned this pull request

[Dynamic Selection] Customization of Backends and Policies #2508

Merged

danhoeflinger added this to the 2022.11.0 milestone

danhoeflinger mentioned this pull request

Removal of Selection API from Dynamic Selection #2489

Merged

danhoeflinger added 13 commits

December 16, 2025 10:02


          remove extra whitespace

eacea1e

Signed-off-by: Dan Hoeflinger <[email protected]>


          matching conventions

4eb36ea

Signed-off-by: Dan Hoeflinger <[email protected]>


          minor fixes

8d78da5

Signed-off-by: Dan Hoeflinger <[email protected]>


          clarifying where cor_resource_backend default is useful

9842a01

Signed-off-by: Dan Hoeflinger <[email protected]>


          diagram fixes

2c88567

Signed-off-by: Dan Hoeflinger <[email protected]>


          conventions

69d6ee1

Signed-off-by: Dan Hoeflinger <[email protected]>


          removing redundancy for wait type

2d632d4

Signed-off-by: Dan Hoeflinger <[email protected]>


          reducing redundancy, making language more concise

a73fa5f

Signed-off-by: Dan Hoeflinger <[email protected]>


          reducing redundancy for custom policies

dd7f27d

Signed-off-by: Dan Hoeflinger <[email protected]>


          further reduction in redundancy

94aee86

Signed-off-by: Dan Hoeflinger <[email protected]>


          removal of repetition

c0ea0f7

Signed-off-by: Dan Hoeflinger <[email protected]>


          replace copied code with snippets from impl

b26c3f7

Signed-off-by: Dan Hoeflinger <[email protected]>


          fix code spans

d92efe8

Signed-off-by: Dan Hoeflinger <[email protected]>

akukanov reviewed

View reviewed changes

rfcs/experimental/dynamic_selection/README.md Outdated Show resolved Hide resolved

rfcs/experimental/dynamic_selection/README.md Outdated Show resolved Hide resolved

rfcs/experimental/dynamic_selection/README.md Outdated Show resolved Hide resolved

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated Show resolved Hide resolved

rfcs/experimental/dynamic_selection/customization/custom_policies.md Show resolved Hide resolved

danhoeflinger added 4 commits

December 18, 2025 14:32


          apply suggestion

bed7afe

Signed-off-by: Dan Hoeflinger <[email protected]>


          fixing default implementation

f38db7f

Signed-off-by: Dan Hoeflinger <[email protected]>


          separating notes from named requirements

003564a

Signed-off-by: Dan Hoeflinger <[email protected]>


          separate out info from implementer actions

0b68df0

Signed-off-by: Dan Hoeflinger <[email protected]>

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              - `b` an arbitrary identifier of type `T`

              - `args` an arbitrary parameter pack of types `typename… Args`

              - `s` is of type `S` and satisfies *Selection* and `is_same_v<resource_t<S>, resource_t<T>>` is `true`

              - `f` a function object with signature `/*ret_type*/ fun(resource_t<T>, Args…);`

Contributor Author

vossmjp Dec 19, 2025

Should "fun" be "f"?

Contributor

danhoeflinger Dec 19, 2025

Fixed here and in another place, also fixed ret_type.

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              ## Proposed Design to Enable Easier Customization of Backends

              This proposal presents a flexible backend system based on a `backend_base` template class and a `core_resource_backend` template that can be used for most resource types. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` should exist for that resource. This is not possible to be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible.

Contributor Author

vossmjp Dec 19, 2025

Suggested change

      
            This proposal presents a flexible backend system based on a `backend_base` template class and a `core_resource_backend` template that can be used for most resource types. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` should exist for that resource. This is not possible to be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible.
          
            This proposal presents a flexible backend system that can be used for most resource types. It is based on two template classes: `backend_base` and `core_resource_backend`. For resource backends supporting reporting requirements, explicit specialization of the `core_resource_backend` is necessary since reporting cannot be done generically. For simple types serving policies without reporting requirements, use of the generically written `core_resource_backend` may be possible.

Contributor

danhoeflinger Dec 19, 2025

taken

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              ### Key Components

              1. **`backend_base<ResourceType>`**: A proposed base class template that implements core backend functionality to inherit from.

Contributor Author

vossmjp Dec 19, 2025

Suggested change

      
            1. **`backend_base<ResourceType>`**: A proposed base class template that implements core backend functionality to inherit from.
          
            1. **`backend_base<ResourceType>`**: A proposed base class template that implements common backend functionality to inherit from.

Contributor Author

vossmjp Dec 19, 2025

Since we have a "core_resource_backend" lets not overload the meaning of "core".

Contributor

danhoeflinger Dec 19, 2025

fixed this and in other places for backends.

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              ### Reporting Requirements and Scratch Space Contract

              Backends must now explicitly accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`).

Contributor Author

vossmjp Dec 19, 2025

Suggested change

      
            Backends must now explicitly accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`).
          
            Backends must accept a (possibly empty) variadic list of reporting requirements describing the execution information the Policy will need. These reporting requirements are the same `execution_info` tag types used elsewhere in the Dynamic Selection API (for example `execution_info::task_time_t`, `execution_info::task_submission_t`, `execution_info::task_completion_t`).

Contributor Author

vossmjp Dec 19, 2025

"now" and "explicitly" seem unnecessary.

Contributor

danhoeflinger Dec 19, 2025

taken

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              - Compile-time checks: The backend implementation should `static_assert` if any type in `ReportingReqs...` is not a supported reporting requirement for that backend. This makes unsupported combinations a hard error at compile time.

              - Resource filtering: Some reporting requirements imply properties of the underlying resource or device (for example, timing via `task_time_t` may require device support for profiling tags and queues created with profiling enabled). Backends must examine the provided resources (or query devices when default-initializing resources) and filter out any resources that do not support all requested reporting requirements. Any special resource properties required to implement a reporting requirement must be checked here (for instance, checking `device.has(sycl::aspect::ext_oneapi_queue_profiling_tag)` in addition to creating queues with `sycl::property::queue::enable_profiling()` when `task_time_t` is requested). If after filtering the set of candidate resources there are no resources left that satisfy all requested reporting requirements, the backend must throw a `std::runtime_error` documenting that the requested reporting requirements cannot be satisfied on the available resources.

Contributor Author

vossmjp Dec 19, 2025 •

edited

Loading

"may require device support for profiling tags and queues created with profiling enabled" this is very SYCL specific. Profiling tags or profiling enabled has no generic meaning. Can you be more generic, for example "may limit to only devices that support profiling".

Contributor

danhoeflinger Dec 19, 2025

rewritten to be more generic

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md Outdated

    
              };

              ```

              When a policy tracks execution information (like task timing or completion), the backend needs to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead.

Contributor Author

vossmjp Dec 19, 2025

Suggested change

      
            When a policy tracks execution information (like task timing or completion), the backend needs to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead.
          
            When a policy tracks execution information (like task timing or completion), the backend may need to store temporary data with each selection. For example, the SYCL backend's `scratch_space_t<execution_info::task_time_t>` includes an extra `sycl::event` to store the "start" profiling tag needed to measure elapsed time. For policies without reporting requirements, `scratch_space_t<>` should be empty (or inherit from `no_scratch_t<>`), adding no overhead.

Contributor

danhoeflinger Dec 19, 2025

taken

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_backends.md

    
              Returns the vector of resources stored during construction. Resources must be copyable/movable into a `std::vector` and remain valid throughout the backend's lifetime.

              #### `get_submission_group()` Implementation

              Returns a group object that waits on all resources. The `default_submission_group` attempts to call `wait()` on the `CoreResourceType` by applying `adapter` to the `ResourceType` object. The `CoreResourceType` must provide a `wait()` method that blocks until all work on that resource is complete. Note that the default implementation waits on each resource, not each submission (works for SYCL queues, oneTBB `task_group` objects, etc.). Adapters can enable types without a `wait()` method; for example, `[](auto pointer){ return *pointer; }` allows `sycl::queue*` to work by adapting to `sycl::queue`.

Contributor Author

vossmjp Dec 19, 2025 •

edited

Loading

This is not completely accurate. get_submission_group should wait for all outstanding submissions made via this backend instance. One way to accomplish that is to wait on all resources, such as all queues. But we shouldn't require that it waits on the resources, only that all submissions are done.

Contributor Author

vossmjp Dec 19, 2025

Ignore this previous comment, I was reacting to the 1st sentence as if it described the purpose of get_submission_group. But in context, this falls under the default implementation section and given the text at the end of the paragraph, it's ok.

vossmjp commented

View reviewed changes

rfcs/experimental/dynamic_selection/customization/custom_policies.md

    
              Policies specify reporting requirements (`task_time_t`, `task_submission_t`, `task_completion_t` from `oneapi::dpl::experimental::execution_info`) via the `policy_base` constructor. These are passed to the backend constructor, which filters devices based on feature availability. Examples: `auto_tune_policy` requires `task_time_t`; `dynamic_load_policy` requires `task_submission_t` and `task_completion_t`.

              Selection handles must include a `scratch_space` member of type `backend_traits<Backend>::template selection_scratch_t<reqs...>` to provide backend storage for instrumentation.

Contributor Author

vossmjp Dec 19, 2025

Are selection handles described anywhere? For a policy that needs task_time_t, task_submission_t or task_completion_t reported its not clear how those values are received by the policy via the handle. I suppose the example demonstrates it though.

Contributor

danhoeflinger Dec 19, 2025

Its a good point, I think its a term that is under described, its been on the edge of "implementation details" where "selection" is actually defined, but with the addition of selection_scratch_t, I think we need to define it more clearly.

Contributor

danhoeflinger Dec 19, 2025

Added a selection handle section to custom_policies.md and then linked to it from custom_backends.md.

danhoeflinger added 3 commits

December 19, 2025 11:42


          fix fun and ret_type

8ab9071

Signed-off-by: Dan Hoeflinger <[email protected]>


          address minor feedback

d32f8cd

Signed-off-by: Dan Hoeflinger <[email protected]>


          add section on selection handle

2157da0

Signed-off-by: Dan Hoeflinger <[email protected]>

vossmjp commented

View reviewed changes

Contributor Author

vossmjp left a comment

LGTM. I can't click approve since I'm one of the authors, but I approve!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

RFC