Add more sparse math Ops in numba by tomicapretto · Pull Request #1918 · pymc-devs/pytensor

tomicapretto · 2026-02-26T14:08:19Z

Description

This PR implements the following sparse ops in numba:

SparseSparseMultiply
AddSS
AddSD
AddSSData
StructuredAddSV
Usmm
SamplingDot

Most of the initial implementations were done by Codex, I checked and adapted them as needed.

Checklist

Checked that the pre-commit linting/style checks pass
Included tests that prove the fix is effective or that the new feature works
Added necessary documentation (docstrings and/or example notebooks)
If you are a pro: each commit corresponds to a relevant logical change

Type of change

…ll of its dispatches

tomicapretto · 2026-02-26T16:18:12Z

There are tests for Usmm failing with the numba backend. They fail due to differences being larger than what is allowed.

Details

FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csc-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[dense-csr-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-dense-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csc-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csc-csr-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-dense-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csc-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-float32-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-float32-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-float32-complex64] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-complex64-float32] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-complex64-int16] - tests.unittest_tools.WrongValue: WrongValue
FAILED tests/sparse/test_math.py::TestUsmm::test_basic[csr-csr-False-complex64-complex64] - tests.unittest_tools.WrongValue: WrongValue

I'm exploring if this is because I need to upcast anything before doing math operations in the numba implementation.

tomicapretto · 2026-02-26T16:24:53Z

Python's implementation of Usmm is a more or less straightforward implementation of the steps. From my understanding, Usmm becomes faster only when the local_usmm_csc_dense_inplace rewrite and/or local_usmm_csx are applied.

Something similar occurs with SamplingDot, the Python implementation even computes the entire dense dot product, but then we have the local_sampling_dot_csr rewrite which implements a special case in C.

As far as I understand, the rewrites will not be triggered for the numba backend. What's more, in the numba backend, the implementations are already in their specialized forms, so I don't think they'll be needed.

Useful links to mentioned parts:

Usmm

pytensor/pytensor/sparse/math.py

Lines 2045 to 2066 in 03afa5b

    
           def perform(self, node, inputs, outputs): 
        
               (alpha, x, y, z) = inputs 
        
               (out,) = outputs 
        
               x_is_sparse = psb._is_sparse(x) 
        
               y_is_sparse = psb._is_sparse(y) 
        
               if not x_is_sparse and not y_is_sparse: 
        
                   raise TypeError(x) 
        
               rval = x * y 
        
               if isinstance(rval, scipy_sparse.spmatrix): 
        
                   rval = rval.toarray() 
        
               if rval.dtype == alpha.dtype: 
        
                   rval *= alpha  # Faster because operation is inplace 
        
               else: 
        
                   rval = rval * alpha 
        
               if rval.dtype == z.dtype: 
        
                   rval += z  # Faster because operation is inplace 
        
               else: 
        
                   rval = rval + z 
        
               out[0] = rval

pytensor/pytensor/sparse/rewriting.py

Lines 925 to 964 in 03afa5b

    
           @node_rewriter([usmm_csc_dense]) 
        
           def local_usmm_csc_dense_inplace(fgraph, node): 
        
               if node.op == usmm_csc_dense: 
        
                   return [usmm_csc_dense_inplace(*node.inputs)] 
        
           register_specialize(local_usmm_csc_dense_inplace, "cxx_only", "inplace") 
        
           # This is tested in tests/test_basic.py:UsmmTests 
        
           @node_rewriter([spm.usmm]) 
        
           def local_usmm_csx(fgraph, node): 
        
               """ 
        
               usmm -> usmm_csc_dense 
        
               """ 
        
               if node.op == usmm: 
        
                   alpha, x, y, z = node.inputs 
        
                   x_is_sparse_variable = _is_sparse_variable(x) 
        
                   y_is_sparse_variable = _is_sparse_variable(y) 
        
                   if x_is_sparse_variable and not y_is_sparse_variable: 
        
                       if x.type.format == "csc": 
        
                           x_val, x_ind, x_ptr, x_shape = csm_properties(x) 
        
                           x_nsparse = x_shape[0] 
        
                           dtype_out = ps.upcast( 
        
                               alpha.type.dtype, x.type.dtype, y.type.dtype, z.type.dtype 
        
                           ) 
        
                           if dtype_out not in ("float32", "float64"): 
        
                               return False 
        
                           # Sparse cast is not implemented. 
        
                           if y.type.dtype != dtype_out: 
        
                               return False 
        
                           return [usmm_csc_dense(alpha, x_val, x_ind, x_ptr, x_nsparse, y, z)] 
        
               return False 
        
           register_specialize(local_usmm_csx, "cxx_only")

SamplingDot

pytensor/pytensor/sparse/math.py

Lines 1809 to 1821 in 03afa5b

    
           def perform(self, node, inputs, outputs): 
        
               (x, y, p) = inputs 
        
               (out,) = outputs 
        
               if psb._is_sparse(x): 
        
                   raise TypeError(x) 
        
               if psb._is_sparse(y): 
        
                   raise TypeError(y) 
        
               if not psb._is_sparse(p): 
        
                   raise TypeError(p) 
        
               out[0] = p.__class__(p.multiply(np.dot(x, y.T)))

pytensor/pytensor/sparse/rewriting.py

Lines 2042 to 2067 in 03afa5b

    
           def local_sampling_dot_csr(fgraph, node): 
        
               if not config.blas__ldflags: 
        
                   # The C implementation of SamplingDotCsr relies on BLAS routines 
        
                   return 
        
               if node.op == spm.sampling_dot: 
        
                   x, y, p = node.inputs 
        
                   if p.type.format == "csr": 
        
                       p_data, p_ind, p_ptr, p_shape = sparse.csm_properties(p) 
        
                       z_data, z_ind, z_ptr = sampling_dot_csr( 
        
                           x, y, p_data, p_ind, p_ptr, p_shape[1] 
        
                       ) 
        
                       # This is a hack that works around some missing `Type`-related 
        
                       # static shape narrowing.  More specifically, 
        
                       # `TensorType.convert_variable` currently won't combine the static 
        
                       # shape information from `old_out.type` and `new_out.type`, only 
        
                       # the broadcast patterns, and, since `CSR.make_node` doesn't do 
        
                       # that either, we use `specify_shape` to produce an output `Type` 
        
                       # with the same level of static shape information as the original 
        
                       # `old_out`. 
        
                       old_out = node.outputs[0] 
        
                       new_out = specify_shape( 
        
                           sparse.CSR(z_data, z_ind, z_ptr, p_shape), shape(old_out) 
        
                       ) 
        
                       return [new_out] 
        
               return False

tomicapretto · 2026-02-26T18:31:08Z

@ricardoV94, now the same test fails with the default. I'm not sure what's the approach to follow here. Two ideas come to my mind:

Implement a numba specific result computation function that casts value to output type before doing math, like this

    def f_b(z, a, x, y):
        # Make sure operations are done with the precision of the output dtype
        x = x.astype(out_dtype)
        y = y.astype(out_dtype)
        z = z.astype(out_dtype)
        a = a.astype(out_dtype)
        return z - a * (x * y)

Use the previous function in all cases, but modify Usmm.perform to also upcast x and y before doing x @ y in z + a * (x @ y)

Given how Usmm is implemented in the numba backend, it is difficult, and possibly impossible, to match the default backend’s behavior. In the default backend, if x and y are float32, the x @ y operation is performed in float32, and upcasting only happens afterward, before multiplying by a or adding to z, not before the matrix multiplication itself.

tomicapretto force-pushed the add-numba-sparse-math branch from b836c1f to 01b40cf Compare February 26, 2026 14:17

Add SparseSparseMultiply in numba, expanding tests for multiply and a…

7dca73d

…ll of its dispatches

tomicapretto force-pushed the add-numba-sparse-math branch 2 times, most recently from 622c700 to 93adb10 Compare February 26, 2026 15:35

tomicapretto added 4 commits February 26, 2026 12:51

Implement AddSS, AddSData, and AddSD in numba

605cc3c

Implement StructuredAddSV in numba

af9a1e4

Add Usmm in numba

0519440

Add SamplingDot in numba

810b3b7

tomicapretto force-pushed the add-numba-sparse-math branch from 13f26ab to 810b3b7 Compare February 26, 2026 15:54

tomicapretto added numba sparse variables labels Feb 26, 2026

tomicapretto marked this pull request as ready for review February 26, 2026 16:16

tomicapretto requested a review from ricardoV94 February 26, 2026 16:52

Cast x and y to output dtype in Usmm numba implementation and tests

26316e2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add more sparse math Ops in numba#1918

Add more sparse math Ops in numba#1918
tomicapretto wants to merge 6 commits intopymc-devs:mainfrom
tomicapretto:add-numba-sparse-math

tomicapretto commented Feb 26, 2026 •

edited

Loading

Uh oh!

tomicapretto commented Feb 26, 2026

Uh oh!

tomicapretto commented Feb 26, 2026

Uh oh!

tomicapretto commented Feb 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tomicapretto commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Type of change

Uh oh!

tomicapretto commented Feb 26, 2026

Uh oh!

tomicapretto commented Feb 26, 2026

Uh oh!

tomicapretto commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tomicapretto commented Feb 26, 2026 •

edited

Loading

tomicapretto commented Feb 26, 2026 •

edited

Loading