Skip to content

cudax: align HLL with cuco implementation#8855

Open
sleeepyjack wants to merge 7 commits intoNVIDIA:mainfrom
sleeepyjack:cudax-hll-spark-align
Open

cudax: align HLL with cuco implementation#8855
sleeepyjack wants to merge 7 commits intoNVIDIA:mainfrom
sleeepyjack:cudax-hll-spark-align

Conversation

@sleeepyjack
Copy link
Copy Markdown
Contributor

Description

Closes #8854

Port NVIDIA/cuCollections#792 into the cudax HLL: bit slicing in
__add, finalizer interpolation-anchor selection, and removal of
the now-dead __register_mask helper.

Checklist

  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@sleeepyjack sleeepyjack self-assigned this May 6, 2026
@sleeepyjack sleeepyjack requested a review from a team as a code owner May 6, 2026 22:20
@sleeepyjack sleeepyjack added the cudax Feature intended for the cudax experimental library label May 6, 2026
@github-project-automation github-project-automation Bot moved this to Todo in CCCL May 6, 2026
@cccl-authenticator-app cccl-authenticator-app Bot moved this from Todo to In Review in CCCL May 6, 2026
@sleeepyjack
Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 6, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Bug Fixes

    • Optimized HyperLogLog++ finalizer's interpolation-anchor computation for improved efficiency.
    • Refined bit-slicing and leading-zero computation in the HyperLogLog++ algorithm.
  • Chores

    • Added compile-time type validation for HyperLogLog++ hash implementations.

Walkthrough

Two HyperLogLog++ algorithm files are updated to align cudax with cuCollections: finalizer.cuh simplifies binary-search anchor selection by returning the insertion point directly, and hyperloglog_impl.cuh reworks bit slicing in the __add method and adds a compile-time unsigned-type constraint on the hash function.

Changes

HyperLogLog++ Algorithm Synchronization with cuCollections

Layer / File(s) Summary
Compile-time Validation
cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh (lines 77–78)
Add static_assert requiring the hash function's return type to be unsigned, enforcing a precondition for the revised bit-slicing logic.
Core Algorithm Updates
cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh (lines 164–175), cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh (lines 155–178)
Rework __add to extract register index from top __precision bits and compute rho from one-bit-padded shifted bits; simplify __interpolation_anchor_index to return binary-search insertion point directly instead of selecting the closer neighboring table entry.
Dead Code and Include Cleanup
cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh (line 569), cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh (lines 27–29)
Remove unused __register_mask() helper; replace includes of <cuda/std/__cmath/abs.h> and <cuda/std/__limits/numeric_limits.h> with <cuda/std/__cmath/logarithms.h> and <cuda/std/__cmath/rounding_functions.h>.

Assessment against linked issues

Objective Addressed Explanation
Correct bit slicing in __hyperloglog_impl::__add [#8854]
Simplify finalizer interpolation-anchor selection in _Finalizer::__interpolation_anchor_index [#8854]
Remove now-dead __register_mask helper [#8854]

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4799ccf8-d277-4288-8c34-d459437ba23d

📥 Commits

Reviewing files that changed from the base of the PR and between 69d3293 and 445b384.

📒 Files selected for processing (2)
  • cudax/include/cuda/experimental/__cuco/__hyperloglog/finalizer.cuh
  • cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh

using __fp_type = double; ///< Floating point type used for reduction
using __hash_value_type = decltype(::cuda::std::declval<_Hash>()(::cuda::std::declval<_Tp>())); ///< Hash value type

static_assert(::cuda::std::is_unsigned_v<__hash_value_type>, "HyperLogLog requires an unsigned hash value type");
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify hash return widths used with HLL and whether narrow unsigned types are instantiated.
# Expected: HLL hash functors should resolve to widths safely larger than runtime precision.

set -euo pipefail

echo "== HLL type declarations/usages =="
rg -nP --type=cpp --type=cu --type=h --type=hpp --type=cuh '\b(__hyperloglog_impl|hyperloglog)\b'

echo
echo "== Potential hash functor operator() return types (look for uint8_t/uint16_t etc.) =="
rg -nP --type=cpp --type=cu --type=h --type=hpp --type=cuh 'operator\s*\(\s*[^)]*\)\s*(const)?\s*(noexcept)?\s*(->\s*[^ {;]+)?'

echo
echo "== Explicit narrow unsigned type mentions near hash code =="
rg -nP --type=cpp --type=cu --type=h --type=hpp --type=cuh -C2 '\b(uint8_t|uint16_t|unsigned char|unsigned short)\b'

Repository: NVIDIA/cccl

Length of output: 135


🏁 Script executed:

#!/bin/bash
# Corrected: use only recognized file types; check HLL instantiations and narrow types

set -euo pipefail

echo "=== Examining hyperloglog_impl.cuh (lines 70-180) ==="
head -n 180 cudax/include/cuda/experimental/__cuco/__hyperloglog/hyperloglog_impl.cuh | tail -n +70

echo
echo "=== HLL type declarations/usages (all files) ==="
rg -nP --type-list | grep -i cpp || echo "Available file types:"
rg --type-list | head -20

echo
echo "=== Search for hyperloglog instantiations ==="
rg -n 'hyperloglog' cudax/include/cuda/experimental/__cuco/ --type-list

echo
echo "=== Search in .cuh, .hpp, .h files for narrow unsigned types near hash ==="
find cudax/include/cuda/experimental/__cuco/ -type f \( -name "*.cuh" -o -name "*.hpp" -o -name "*.h" \) | head -20

echo
echo "=== Check __cuco directory structure ==="
ls -la cudax/include/cuda/experimental/__cuco/__hyperloglog/ 2>/dev/null || echo "Directory not accessible"

Repository: NVIDIA/cccl

Length of output: 6113


🏁 Script executed:

#!/bin/bash
# Search for HLL instantiations, tests, and usage of narrow unsigned types with HLL

set -euo pipefail

echo "=== Find test files for hyperloglog ==="
find cudax -name "*hyperloglog*" -type f | head -20

echo
echo "=== Find __cuco test files ==="
find cudax -path "*__cuco*" -name "*.cu" -o -name "*.cpp" 2>/dev/null | grep -i test | head -20

echo
echo "=== Search for HLL constructor/instantiation calls ==="
rg -n '__hyperloglog_impl|hyperloglog' cudax --type=cpp --max-count=30

echo
echo "=== Check what hash types are used (search for declval in __hyperloglog context) ==="
rg -A5 -B5 '__hyperloglog_impl' cudax/include/cuda/experimental/__cuco/__hyperloglog/ --type=cpp

echo
echo "=== Look for example/test usage patterns ==="
find . -path "*/test*" -name "*.cu" -o -name "*.cpp" | xargs grep -l "hyperloglog" 2>/dev/null | head -5

Repository: NVIDIA/cccl

Length of output: 564


🏁 Script executed:

#!/bin/bash
# Examine test and public API files to understand HLL instantiations

echo "=== Test file: test_hyperloglog.cu (first 200 lines) ==="
head -n 200 cudax/test/cuco/hyperloglog/test_hyperloglog.cu

echo
echo "=== Public API wrapper: hyperloglog.cuh ==="
cat cudax/include/cuda/experimental/__cuco/hyperloglog.cuh

echo
echo "=== Search for any hash functor returning uint8_t, uint16_t in test or examples ==="
rg -n 'uint8_t|uint16_t' cudax/test/cuco/hyperloglog/ 2>/dev/null || echo "No matches in test dir"

Repository: NVIDIA/cccl

Length of output: 27560


important: Line 77's static_assert enforces unsignedness but not minimum width. Lines 173 and 175 invoke undefined behavior if __precision >= numeric_limits<__hash_value_type>::digits. Although no production instantiation uses narrow hash types (all default to 64-bit), add a compile-time precondition to constrain the hash type width relative to the runtime precision range [4, 18], or document the unsupported instantiation cases.

Copy link
Copy Markdown
Contributor Author

@sleeepyjack sleeepyjack May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be addressed in a PR that follows shortly and adds a Policy abstraction to allow for various HLL flavors. The hash type check will be tied to the specific policy.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sleeepyjack Understood. Would you like me to open a GitHub issue to track the follow-up (adding a compile-time width check ensuring numeric_limits<__hash_value_type>::digits > 18 or equivalent)?

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@sleeepyjack sleeepyjack requested a review from PointKernel May 7, 2026 17:05
Comment on lines 156 to -159
int __left = 0;
int __right = static_cast<int>(__n) - 1;
int __mid = -1;
int __candidate_index = 0; // Index of the closest element found
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how we missed this in the first place, could we use sized integer instead? e.g. int32_t as opposed to int?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🥳 CI Workflow Results

🟩 Finished in 44m 37s: Pass: 100%/55 | Total: 15h 22m | Max: 44m 35s | Hits: 34%/110407

See results here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cudax Feature intended for the cudax experimental library

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

cudax HLL has drifted from the cuCollections implementation

2 participants