[cudax] Add support for generic thread groups within warp and cluster#8792
[cudax] Add support for generic thread groups within warp and cluster#8792davebayer wants to merge 1 commit into
Conversation
b20f402 to
37cc75e
Compare
🥳 CI Workflow Results🟩 Finished in 32m 24s: Pass: 100%/54 | Total: 5h 14m | Max: 32m 17s | Hits: 97%/31859See results here. |
| using Level = typename Group::level_type; | ||
|
|
||
| if (!Unit{}.is_part_of(group)) | ||
| if constexpr (cuda::std::is_same_v<Level, cuda::warp_level>) |
There was a problem hiding this comment.
Question: Does this also handle cluster level fine or was the comment outdated
| // todo(dabayer): Implement fallback for cc < 80. | ||
| T result; | ||
| NV_IF_TARGET(NV_PROVIDES_SM_80, | ||
| ({ result = __reduce_add_sync(group.__synchronizer_instance().__lane_mask(), result_unit.value()); })) | ||
| return (cuda::gpu_thread.is_root_rank(group)) ? cuda::std::optional{result} : cuda::std::nullopt; |
There was a problem hiding this comment.
Question: That comment suggests the code path is not valid for SM < 80, we should at least assert that to ensure we do not forget it once this goes out of experimentall
| // todo(dabayer): Implement fallback for cc < 80. | ||
| T result; | ||
| NV_IF_TARGET(NV_PROVIDES_SM_80, | ||
| ({ result = __reduce_add_sync(group.__synchronizer_instance().__lane_mask(), result_unit.value()); })) |
There was a problem hiding this comment.
Question: Cannot we use ThreadReduce here just fine?
It should use the __reduce_add_sync optimization when applicable
| { | ||
| group_sums[group_rank] = 0; | ||
| } | ||
| __shared__ T group_sums[ngroups]; |
There was a problem hiding this comment.
Hmm, I was thinking if in actual interface shared memory should come from the user instead, for example here if only one group calls this we waste shared memory
There was a problem hiding this comment.
This is correct, but I would say that this is outside of this epic's scope, so I just went with statically shared memory allocations inside the algorithms
No description provided.