Skip to content

autotuneHaloBackend aggressively rejects valid decompositions for flat 3D grids (e.g. NxNx1) #101

@tategotoazarasi

Description

@tategotoazarasi

When running autotuneHaloBackend on a "flat" 3D grid (where gdims[2] == 1), the autotuner rejects all possible parallel process grid configurations, causing cudecompGridDescCreate to fail with CUDECOMP_RESULT_NOT_SUPPORTED.

This prevents the use of cuDecomp for 2D stencils or simulations mapped to a 3D grid with thickness 1, even when using pencil configurations that should not split the unit dimension (e.g., Z-Pencils).

In src/autotune.cc, the loop for testing grid factors contains the following check (around line 590):

    // Skip any decompositions with empty pencils
    if (std::max(grid_desc->config.pdims[0], grid_desc->config.pdims[1]) >
        std::min(grid_desc->config.gdims[1], grid_desc->config.gdims[2])) {
      continue;
    }

If the user provides a grid with dimensions [N, N, 1] (a 2D plane):

  1. grid_desc->config.gdims[2] is 1.
  2. std::min(..., 1) results in 1.
  3. The check becomes if (std::max(pdims[0], pdims[1]) > 1) continue;.

For any parallel run with nranks > 1, std::max(pdims[0], pdims[1]) will always be greater than 1 (e.g., for 4 ranks: 4x1, 1x4, or 2x2). Consequently, all configurations are skipped.

This check appears to be overly restrictive or assumes a specific pencil orientation (likely X-Pencil) where gdims[1] and gdims[2] are the split dimensions. However, for a Z-Pencil configuration (halo_axis = 2), the decomposition occurs along gdims[0] and gdims[1], and gdims[2] remains contiguous/unsplit. Therefore, the small size of gdims[2] should not invalidate the decomposition.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions