[libcu++] Add shared versions of memory pool owning objects by pciolkosz · Pull Request #8802 · NVIDIA/cccl

pciolkosz · 2026-05-04T18:13:56Z

This PR adds shared_*_memory_pool types, which are memory pools with shared ownership semantics. Currently shared_resource can be used with a memory pool type to have shared ownership of a pool, but it has some downsides:

User needs to know the shared_resource wrapper exist, if they just reach for the memory pool type it won't work out of the box with any_resource, buffer etc.
Member functions of the pool need to be accessed with .res.get.member(), the new types have the members available immediately
Shared resource is a bit more difficult to construct.

This PR also removes release member function from *_memory_pool_ref types that was added there by mistake and didn't make much sense.
It also adds no_init constructors to *_memory_pool types to align with the shared counterparts and provide a way to have an object you move a pool into at some later point.

The implementation moved the ref counting bits of shared_resource to an internal class, so it could be reused for these new shared pool types. There is a bit of duplication because of that, the deleter holds another copy of the memory pool pointer, but it allowed to keep the implementation simpler

Summary by CodeRabbit

Release Notes

New Features
- Added shared-ownership memory pool variants: shared_device_memory_pool, shared_managed_memory_pool, and shared_pinned_memory_pool enabling copyable shared ownership semantics
- Added support for creating empty memory pools via no_init constructor
- Added release() method to existing pool types for explicit ownership transfer
Tests
- Added comprehensive test coverage for new shared memory pool functionality

Jacobfaib · 2026-05-06T14:19:33Z

+  operator!=(const __shared_block_ptr& __lhs, const __shared_block_ptr& __rhs) noexcept
+  {
+    return __lhs.__block_ != __rhs.__block_;
+  }


Important: shared_ptr and friends also implement the other relational operators. We probably don't need all of them, but at the very least we should:

Implement operators against nullptr_t, so we can say __shard_block_ptr == nullptr

Implement operator< so we can put this in std::map

Implement std::hash so we can put this in a std::unordered_map

This an internal type, I would rather add that stuff on case by case basis when its needed

Jacobfaib · 2026-05-06T14:20:23Z

+  _CCCL_HOST_API shared_resource(const shared_resource& __other) noexcept
+      : __block_(__other.__block_)
+  {}


Important: can be = default since __block_ implements copy constructors.

If I do it nvcc 12 gets confused in execution space checks

Jacobfaib · 2026-05-06T14:20:48Z

+  _CCCL_HOST_API shared_resource(shared_resource&& __other) noexcept
+      : __block_(::cuda::std::move(__other.__block_))
  {}


Important: can be = default as well.

If I do it nvcc 12 gets confused in execution space checks

bdice

Generally looks good to me. I had one question about the supported CUDA version for pinned pools. I am okay with requiring 12.9 -- we already require 12.9 for all CUDA 12 compilation in RAPIDS -- just wanted to verify the intent and make sure I wasn't missing something.

pciolkosz · 2026-05-12T16:05:46Z

@coderabbitai full review

coderabbitai · 2026-05-12T16:06:50Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-05-12T16:17:33Z

Walkthrough

This PR introduces shared-ownership memory pools by implementing a ref-counted smart-pointer infrastructure, updating the existing shared_resource type to use it, extending standalone pools with empty-construction and ownership-transfer APIs, and adding three new shared pool variants for device, managed, and pinned memory with automatic pool destruction on scope exit.

Changes

Shared-ownership memory pool infrastructure

Layer / File(s)	Summary
Shared block pointer ref-counted ownership mechanism `libcudacxx/include/cuda/__memory_resource/shared_block_ptr.h`	Introduces `__shared_control_block<_Payload>` holding payload and atomic reference count (initialized to 1), and `__shared_block_ptr<_Payload>` managing control-block lifetime with noexcept copy/move, atomic increment/decrement, acquire/release fence ordering, payload accessors, and equality operators.
CRTP shared memory pool base with pool destruction `libcudacxx/include/cuda/__memory_pool/shared_memory_pool_base.h`	Introduces `__pool_destroyer` RAII wrapper destroying `cudaMemPool_t` on scope exit via `__mempoolDestroy`, and `__shared_memory_pool_base<_Derived>` CRTP base inheriting from `__memory_pool_base` and `memory_resource_base`, holding `__shared_block_ptr<__pool_destroyer>`, providing noexcept copy/move constructors, defaulted assignment, and explicitly deleted `release()`.
Refactor shared_resource to use shared block pointer `libcudacxx/include/cuda/__memory_resource/shared_resource.h`	Replaces manual control-block ref-counting with `__shared_block_ptr<_Resource>` storage; makes copy/move/assignment operators noexcept with `_CCCL_HOST_API` annotations; removes user-declared destructor; delegates all operations to `__block_.__payload()`; simplifies equality/comparison via payload pointer comparison.
Standalone pool no_init constructors and release methods `libcudacxx/include/cuda/__memory_pool/{device,managed,pinned}_memory_pool.h`, `libcudacxx/include/cuda/__memory_pool/memory_pool_base.h`	Adds `no_init_t` constructor (empty pool with null handle) and `release()` method (returns/clears handle via `std::exchange`) to `device_memory_pool`, `managed_memory_pool`, `pinned_memory_pool`; removes `release()` from `__memory_pool_base` base class to restrict ownership transfer to concrete types only.
Shared device memory pool with device_ref construction `libcudacxx/include/cuda/__memory_pool/shared_device_memory_pool.h`	Introduces `shared_device_memory_pool` inheriting from `__shared_memory_pool_base<shared_device_memory_pool>` with no_init and device constructors, `from_native_handle()` factory, friend `get_property` for `device_accessible`, `default_queries` alias, and `static_assert` validating `resource_with` concept.
Shared managed memory pool with fallback properties (CTK>=13.0) `libcudacxx/include/cuda/__memory_pool/shared_managed_memory_pool.h`	Introduces `shared_managed_memory_pool` (CTK >= 13.0 only) inheriting from `__shared_memory_pool_base<shared_managed_memory_pool>` with no_init and optional-properties constructors, `from_native_handle()` factory, friend `get_property` for device and host accessibility, `default_queries` alias, and dual `static_assert` validating both accessibility modes.
Shared pinned memory pool with NUMA and properties variants `libcudacxx/include/cuda/__memory_pool/shared_pinned_memory_pool.h`	Introduces `shared_pinned_memory_pool` (CTK available only) inheriting from `__shared_memory_pool_base<shared_pinned_memory_pool>` with no_init, conditional-properties, and NUMA-node constructors, `from_native_handle()` factory, friend `get_property` for device/host accessibility, `default_queries` alias, and dual `static_assert` validating both accessibility modes.
Public memory pool interface aggregation `libcudacxx/include/cuda/memory_pool`	Added includes for `shared_device_memory_pool`, `shared_managed_memory_pool`, `shared_pinned_memory_pool` alongside existing standalone pool includes.
Standalone pool test updates and no_init validation `libcudacxx/test/libcudacxx/cuda/memory_resource/resources/memory_pools.cu`	Refactored `construct_pool` helper to remove device_id parameter; updated 11 existing test cases to use new signature; added `no_init` constructor tests for `device_memory_pool`, `pinned_memory_pool`, `managed_memory_pool` (feature-gated) asserting null handle.
Comprehensive shared pool test suite `libcudacxx/test/libcudacxx/cuda/memory_resource/resources/shared_memory_pools.cu`	New test file covering construction, copy/move semantics (ownership sharing, ownership transfer, lifetime preservation through copies), equality, pool operations (async/sync allocation, trim, attribute access), and resource concept validation for all three shared pool variants across toolkit versions and platform conditions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 A warren of memory pools now shares the same ref-counted burrow,
With smart pointers guarding lifetime through CRTP's shared borrow—
No more manual destruction when the last pool copy hops away,
Device, pinned, and managed pools construct in newfound ways!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.03% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely describes the main feature: adding shared versions of memory pool types with shared ownership semantics.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Warning

Review ran into problems

🔥 Problems

Timed out fetching pipeline failures after 30000ms

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

libcudacxx/include/cuda/__memory_resource/shared_block_ptr.h (1)

67-68: 💤 Low value

Add noexcept to the default constructor.

The default constructor only default-initializes __block_ to nullptr, which cannot throw.

Suggested fix

   //! `@brief` Constructs a null ``__shared_block_ptr`` with no control block.
-  __shared_block_ptr() = default;
+  __shared_block_ptr() noexcept = default;

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libcudacxx/include/cuda/__memory_resource/shared_block_ptr.h` around lines 67
- 68, The default constructor __shared_block_ptr() should be marked noexcept
because it only initializes __block_ to nullptr; update the declaration of
__shared_block_ptr() to be __shared_block_ptr() noexcept = default; so the
constructor is noexcept-qualified (no other code changes needed).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libcudacxx/include/cuda/__memory_pool/shared_managed_memory_pool.h`:
- Around line 73-78: Add the [[nodiscard]] attribute to the static factory
shared_managed_memory_pool::from_native_handle and extend its Doxygen comment to
include a //! `@return` describing that it returns a shared_managed_memory_pool
which takes shared ownership of the provided ::cudaMemPool_t handle; keep the
function signature otherwise unchanged and ensure the documentation clearly
states ownership semantics.

In `@libcudacxx/include/cuda/__memory_pool/shared_memory_pool_base.h`:
- Around line 97-101: The move constructor
__shared_memory_pool_base(__shared_memory_pool_base&& __other) leaves
__other.__pool_ unchanged which is misleading because __other.__ref_ no longer
owns the control block; update the move ctor (the function in the
__shared_memory_pool_base class that delegates to __memory_pool_base and moves
__ref_) to explicitly reset __other.__pool_ (e.g., set it to nullptr or
equivalent empty handle) after moving __ref_ so the moved-from object's state is
consistent and __other.get() no longer returns a valid-looking handle.

In `@libcudacxx/include/cuda/__memory_pool/shared_pinned_memory_pool.h`:
- Around line 84-89: Add the [[nodiscard]] attribute to the static factory
shared_pinned_memory_pool::from_native_handle(::cudaMemPool_t __pool) and update
its Doxygen to include a //! `@return` describing that it returns a
shared_pinned_memory_pool that takes shared ownership of the provided
::cudaMemPool_t; keep the function noexcept and implementation returning
shared_pinned_memory_pool(__pool).

---

Nitpick comments:
In `@libcudacxx/include/cuda/__memory_resource/shared_block_ptr.h`:
- Around line 67-68: The default constructor __shared_block_ptr() should be
marked noexcept because it only initializes __block_ to nullptr; update the
declaration of __shared_block_ptr() to be __shared_block_ptr() noexcept =
default; so the constructor is noexcept-qualified (no other code changes
needed).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Enterprise

Run ID: 1e78e6a6-0ccd-4859-bdcd-01169cc19c3f

📥 Commits

Reviewing files that changed from the base of the PR and between b60f063 and 8801e88.

📒 Files selected for processing (13)

libcudacxx/include/cuda/__memory_pool/device_memory_pool.h
libcudacxx/include/cuda/__memory_pool/managed_memory_pool.h
libcudacxx/include/cuda/__memory_pool/memory_pool_base.h
libcudacxx/include/cuda/__memory_pool/pinned_memory_pool.h
libcudacxx/include/cuda/__memory_pool/shared_device_memory_pool.h
libcudacxx/include/cuda/__memory_pool/shared_managed_memory_pool.h
libcudacxx/include/cuda/__memory_pool/shared_memory_pool_base.h
libcudacxx/include/cuda/__memory_pool/shared_pinned_memory_pool.h
libcudacxx/include/cuda/__memory_resource/shared_block_ptr.h
libcudacxx/include/cuda/__memory_resource/shared_resource.h
libcudacxx/include/cuda/memory_pool
libcudacxx/test/libcudacxx/cuda/memory_resource/resources/memory_pools.cu
libcudacxx/test/libcudacxx/cuda/memory_resource/resources/shared_memory_pools.cu

💤 Files with no reviewable changes (1)

libcudacxx/include/cuda/__memory_pool/memory_pool_base.h

coderabbitai · 2026-05-12T16:17:37Z

+  //! @brief Constructs a shared managed memory pool from an existing native handle.
+  //! @param __pool The ``cudaMemPool_t`` to take shared ownership of.
+  _CCCL_HOST_API static shared_managed_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept
+  {
+    return shared_managed_memory_pool(__pool);
+  }


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add [[nodiscard]] and complete Doxygen documentation.

The from_native_handle static factory has two issues:

Missing [[nodiscard]] attribute — the function returns a value with no side effects, so discarding the result is likely a mistake.

Incomplete Doxygen documentation — missing //! @return`` describing the returned object.

As per coding guidelines: "Most functions with a non-void return type shall use [[nodiscard]]" and "When a function is documented with Doxygen, it must include... //! @return`` for non-void functions".

Proposed fix

//! `@brief` Constructs a shared managed memory pool from an existing native handle. //! `@param` __pool The ``cudaMemPool_t`` to take shared ownership of. - _CCCL_HOST_API static shared_managed_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept + //! `@return` A shared_managed_memory_pool that shares ownership of the provided handle. + _CCCL_NODISCARD _CCCL_HOST_API static shared_managed_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept { return shared_managed_memory_pool(__pool); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libcudacxx/include/cuda/__memory_pool/shared_managed_memory_pool.h` around lines 73 - 78, Add the [[nodiscard]] attribute to the static factory shared_managed_memory_pool::from_native_handle and extend its Doxygen comment to include a //! `@return` describing that it returns a shared_managed_memory_pool which takes shared ownership of the provided ::cudaMemPool_t handle; keep the function signature otherwise unchanged and ensure the documentation clearly states ownership semantics.

coderabbitai · 2026-05-12T16:17:37Z

+  _CCCL_EXEC_CHECK_DISABLE
+  _CCCL_HOST_API __shared_memory_pool_base(__shared_memory_pool_base&& __other) noexcept
+      : __memory_pool_base(__other.__pool_)
+      , __ref_(::cuda::std::move(__other.__ref_))
+  {}


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Move constructor leaves __other.__pool_ unchanged.

After moving, __other.__ref_ no longer owns the control block, but __other.__pool_ still holds the raw pool handle. While this doesn't cause a double-free (the __pool_destroyer in the moved-from __ref_ won't destroy anything), it leaves __other in a potentially confusing state where __other.get() returns a valid-looking handle that the object no longer owns.

Consider nulling out __other.__pool_ for consistency with the moved-from state of the reference:

Suggested fix

_CCCL_EXEC_CHECK_DISABLE _CCCL_HOST_API __shared_memory_pool_base(__shared_memory_pool_base&& __other) noexcept - : __memory_pool_base(__other.__pool_) + : __memory_pool_base(::cuda::std::exchange(__other.__pool_, nullptr)) , __ref_(::cuda::std::move(__other.__ref_)) {}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_CCCL_EXEC_CHECK_DISABLE

_CCCL_HOST_API __shared_memory_pool_base(__shared_memory_pool_base&& __other) noexcept

: __memory_pool_base(__other.__pool_)

, __ref_(::cuda::std::move(__other.__ref_))

{}

_CCCL_EXEC_CHECK_DISABLE

_CCCL_HOST_API __shared_memory_pool_base(__shared_memory_pool_base&& __other) noexcept

: __memory_pool_base(::cuda::std::exchange(__other.__pool_, nullptr))

, __ref_(::cuda::std::move(__other.__ref_))

{}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libcudacxx/include/cuda/__memory_pool/shared_memory_pool_base.h` around lines 97 - 101, The move constructor __shared_memory_pool_base(__shared_memory_pool_base&& __other) leaves __other.__pool_ unchanged which is misleading because __other.__ref_ no longer owns the control block; update the move ctor (the function in the __shared_memory_pool_base class that delegates to __memory_pool_base and moves __ref_) to explicitly reset __other.__pool_ (e.g., set it to nullptr or equivalent empty handle) after moving __ref_ so the moved-from object's state is consistent and __other.get() no longer returns a valid-looking handle.

coderabbitai · 2026-05-12T16:17:37Z

+  //! @brief Constructs a shared pinned memory pool from an existing native handle.
+  //! @param __pool The ``cudaMemPool_t`` to take shared ownership of.
+  _CCCL_HOST_API static shared_pinned_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept
+  {
+    return shared_pinned_memory_pool(__pool);
+  }


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add [[nodiscard]] and complete Doxygen documentation.

The from_native_handle static factory has two issues:

Missing [[nodiscard]] attribute — the function returns a value with no side effects, so discarding the result is likely a mistake.

Incomplete Doxygen documentation — missing //! @return`` describing the returned object.

As per coding guidelines: "Most functions with a non-void return type shall use [[nodiscard]]" and "When a function is documented with Doxygen, it must include... //! @return`` for non-void functions".

Proposed fix

//! `@brief` Constructs a shared pinned memory pool from an existing native handle. //! `@param` __pool The ``cudaMemPool_t`` to take shared ownership of. - _CCCL_HOST_API static shared_pinned_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept + //! `@return` A shared_pinned_memory_pool that shares ownership of the provided handle. + _CCCL_NODISCARD _CCCL_HOST_API static shared_pinned_memory_pool from_native_handle(::cudaMemPool_t __pool) noexcept { return shared_pinned_memory_pool(__pool); }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@libcudacxx/include/cuda/__memory_pool/shared_pinned_memory_pool.h` around lines 84 - 89, Add the [[nodiscard]] attribute to the static factory shared_pinned_memory_pool::from_native_handle(::cudaMemPool_t __pool) and update its Doxygen to include a //! `@return` describing that it returns a shared_pinned_memory_pool that takes shared ownership of the provided ::cudaMemPool_t; keep the function noexcept and implementation returning shared_pinned_memory_pool(__pool).

davebayer · 2026-05-13T16:28:57Z

+      : __memory_pool_base(__pool)
+      , __ref_(__pool)


We duplicate the stored handles? We store cudaMempool_t once in __memory_pool_base and the second time in __shared_block_ptr? Can't we do better?

I implemented it first without the duplication, but it made the implementation much messier, I don't think it's worth it. You either use only the copy in the shared_block and then you have to reimplement half of __memory_pool_base or you only use the one in the pool base, but you have to do some weird destruction pattern, where you have an explicit destroy call for the shared_block. Both are workable solutions, but the code becomes either redundant or difficult to follow

github-actions · 2026-05-13T21:44:57Z

🥳 CI Workflow Results

🟩 Finished in 2h 12m: Pass: 100%/113 | Total: 3d 03h | Max: 1h 22m | Hits: 58%/627564

See results here.

Add shared versions of memory pool owning objects

3fb9b5d

pciolkosz requested a review from a team as a code owner May 4, 2026 18:13

pciolkosz requested a review from Jacobfaib May 4, 2026 18:13

github-project-automation Bot added this to CCCL May 4, 2026

github-project-automation Bot moved this to Todo in CCCL May 4, 2026

cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 4, 2026