GPU Parallel Primitives Library

A GPU-accelerated library of parallel primitives implemented in CUDA, with C++ host code and Python bindings via pybind11. Use it from Python with NumPy arrays for reduce, scan, histogram, and radix sort—without writing CUDA yourself.

What is this?

gpuprims provides a small set of high-performance, single-GPU primitives commonly used as building blocks in parallel algorithms:

Primitive	Description	Supported types
reduce_sum	Sum of a 1D array → scalar	`int32`, `float32`
exclusive_scan	Exclusive prefix sum → 1D array	`int32`, `float32`
histogram	Fixed-bin counts → 1D array of counts	`uint32`, `int32` (non-negative)
radix_sort	Stable ascending sort → 1D array	`uint32`

All operations run on the GPU. Inputs are 1D contiguous NumPy arrays; the library copies data to the device, runs the kernel, and returns results to Python.

Requirements

CUDA Toolkit 11.x or 12.x
C++17-capable compiler (e.g. GCC 9+, Clang 10+, or MSVC with CUDA on Windows)
Python 3.9 or newer
NumPy 1.20+
CMake 3.18+ (used when building the Python extension)

Installation

Clone the repository and enter the project directory:

git clone <repository-url>
cd gpu-parallel-primitives-library

Ensure CUDA is available. Set CUDA_PATH if your toolkit is not in the default location.
Install the package (builds the CUDA extension via CMake):
```
pip install .
```
For editable/development installs (recommended if you change code):
```
pip install -e .
```
Optional development dependencies (tests, formatters):
```
pip install -e ".[dev]"
```

Quick Start

import gpuprims
import numpy as np

# Reduce: sum of array → scalar
x = np.array([1, 2, 3, 4, 5], dtype=np.int32)
print(gpuprims.reduce_sum(x))   # 15

# Exclusive scan: prefix sum (first element is 0)
y = np.array([1.0, 2.0, 3.0, 4.0], dtype=np.float32)
print(gpuprims.exclusive_scan(y))   # [0. 1. 3. 6.]

# Histogram: count values in [0, bins)
z = np.array([0, 1, 1, 2, 2, 2], dtype=np.uint32)
print(gpuprims.histogram(z, 3))   # [1, 2, 3]

# Radix sort: stable ascending sort (uint32 only)
w = np.array([3, 1, 4, 1, 5], dtype=np.uint32)
print(gpuprims.radix_sort(w))   # [1, 1, 3, 4, 5]

Run the full example script from the repo root:

python examples/example_usage.py

API Summary

gpuprims.reduce_sum(x) — Returns the sum of x. x: 1D int32 or float32 array.
gpuprims.exclusive_scan(x) — Returns exclusive prefix sum; same shape and dtype as x. x: 1D int32 or float32 array.
gpuprims.histogram(x, bins) — Returns a 1D array of length bins with counts for values in [0, bins). x: 1D uint32 or non-negative int32 array.
gpuprims.radix_sort(x) — Returns a new 1D array with values sorted in ascending order (stable). x: 1D uint32 array.

Input arrays must be 1D, contiguous, and of a supported dtype; otherwise the library may raise or behavior is undefined.

Project layout

Python API → pybind11 bindings → C++ wrappers → CUDA kernels
Build: CMake (reduce, scan, histogram, radix_sort, wrappers, bindings). The Python wheel is built with scikit-build-core.

Running tests

From the project root:

pytest tests/

Requires the package to be installed (e.g. pip install -e ".[dev]").

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
benchmarks		benchmarks
examples		examples
include/gpuprims		include/gpuprims
python/gpuprims		python/gpuprims
src		src
tests		tests
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Parallel Primitives Library

What is this?

Requirements

Installation

Quick Start

API Summary

Project layout

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPU Parallel Primitives Library

What is this?

Requirements

Installation

Quick Start

API Summary

Project layout

Running tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages