Skip to content

Conversation

@akukanov
Copy link
Contributor

@akukanov akukanov commented Dec 4, 2025

On top of #2499, this patch unifies the copy_if implementation for parallel and vector policies. combining much of the code previously separate for the "bounded" ranges::copy_if and the "regular" C++17 copy_if.

  • Support bounded output in __simd_copy_by_mask
  • Simplify __brick_bounded_copy_by_mask and make it stop according to the ranges::copy_if semantics
  • Subsume __brick_copy_by_mask into __brick_bounded_copy_by_mask
  • Support bounded output in __parallel_selective_copy
  • Make __pattern_bounded_copy_if "redirecting" to __parallel_selective_copy

@akukanov akukanov force-pushed the dev/improve-copy-if-akukanov branch 10 times, most recently from 90501f7 to 09de756 Compare December 5, 2025 18:04
@akukanov akukanov changed the title WIP refactoring around copy_if Unify the "regular" and "bounded" implementations for copy_if Dec 5, 2025
@akukanov akukanov added this to the 2022.12.0 milestone Dec 5, 2025
@akukanov akukanov force-pushed the dev/improve-copy-if-akukanov branch from caec240 to 2640691 Compare December 17, 2025 19:20
@akukanov akukanov marked this pull request as ready for review December 17, 2025 19:21
@SergeyKopienko
Copy link
Contributor

I think the declaration of

template <bool, class _RandomAccessIterator1, class _RandomAccessIterator2, class _Bound, class _Assigner>
_Bound
__brick_copy_by_mask(_RandomAccessIterator1, _Bound, _RandomAccessIterator2, _Bound, bool*, _Assigner,
                     /*vector=*/std::false_type) noexcept;

template <bool, class _RandomAccessIterator1, class _RandomAccessIterator2, class _Bound, class _Assigner>
_Bound
__brick_copy_by_mask(_RandomAccessIterator1, _Bound, _RandomAccessIterator2, _Bound, bool*, _Assigner,
                     /*vector=*/std::true_type) noexcept;

not aligned with the implementations:

template <bool __Bounded, class _RandomAccessIterator1, class _RandomAccessIterator2, class _Bound, class _Assigner>
_Bound
__brick_copy_by_mask(_RandomAccessIterator1 __first, _Bound __in_len, _RandomAccessIterator2 __result, _Bound __out_len,
                     bool* __mask, _Assigner __assigner, /*vector=*/std::false_type) noexcept
{
   // ,,,
}

template <bool __Bounded, class _RandomAccessIterator1, class _RandomAccessIterator2, class _Bound, class _Assigner>
_Bound
__brick_copy_by_mask(_RandomAccessIterator1 __first, _Bound __in_len, _RandomAccessIterator2 __result, _Bound __out_len,
                     bool* __mask, _Assigner __assigner, /*vector=*/std::true_type) noexcept
{
    return __unseq_backend::__simd_copy_by_mask<__Bounded>(__first, __in_len, __result, __out_len, __mask, __assigner);
}

@SergeyKopienko
Copy link
Contributor

One more question.
For what we have iterator and size in

template <class _IsVector, class _ExecutionPolicy, class _RandomAccessIterator1, class _DifferenceType,
          class _RandomAccessIterator2, class _UnaryPredicate>
std::pair<_RandomAccessIterator1, _RandomAccessIterator2>
__pattern_bounded_copy_if(__parallel_tag<_IsVector>, _ExecutionPolicy&&, _RandomAccessIterator1, _DifferenceType,
                          _RandomAccessIterator2, _DifferenceType, _UnaryPredicate);

Why we can't use two iterators (begin and end) ?
As I tried to find this is the first example of this approach on our pattern's level.

@akukanov
Copy link
Contributor Author

akukanov commented Dec 19, 2025

I think the declaration of ... __brick_copy_by_mask ... not aligned with the implementations

I have checked and have not found any misalignment, aside from omitted names in the forward declarations.

For what we have iterator and size in ... __pattern_bounded_copy_if Why we can't use two iterators (begin and end) ?

The only caller of __pattern_bounded_copy_if has already computed the sizes, which the callee needs. Why would we re-compute them again?

As I tried to find this is the first example of this approach on our pattern's level.

Most (if not all) other __pattern_ functions have two overloads - for serial and parallel implementations. The serial variant does not require random access iterators and therefore takes iterator pairs; the parallel variant has the same API and computes sizes internally. However __pattern_bounded_copy_if does not need a serial overload, and so can have a different API.

Copy link
Contributor

@SergeyKopienko SergeyKopienko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@MikeDvorskiy MikeDvorskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants