-
Notifications
You must be signed in to change notification settings - Fork 14
Description
When running autotuneHaloBackend on a "flat" 3D grid (where gdims[2] == 1), the autotuner rejects all possible parallel process grid configurations, causing cudecompGridDescCreate to fail with CUDECOMP_RESULT_NOT_SUPPORTED.
This prevents the use of cuDecomp for 2D stencils or simulations mapped to a 3D grid with thickness 1, even when using pencil configurations that should not split the unit dimension (e.g., Z-Pencils).
In src/autotune.cc, the loop for testing grid factors contains the following check (around line 590):
// Skip any decompositions with empty pencils
if (std::max(grid_desc->config.pdims[0], grid_desc->config.pdims[1]) >
std::min(grid_desc->config.gdims[1], grid_desc->config.gdims[2])) {
continue;
}If the user provides a grid with dimensions [N, N, 1] (a 2D plane):
grid_desc->config.gdims[2]is 1.std::min(..., 1)results in 1.- The check becomes
if (std::max(pdims[0], pdims[1]) > 1) continue;.
For any parallel run with nranks > 1, std::max(pdims[0], pdims[1]) will always be greater than 1 (e.g., for 4 ranks: 4x1, 1x4, or 2x2). Consequently, all configurations are skipped.
This check appears to be overly restrictive or assumes a specific pencil orientation (likely X-Pencil) where gdims[1] and gdims[2] are the split dimensions. However, for a Z-Pencil configuration (halo_axis = 2), the decomposition occurs along gdims[0] and gdims[1], and gdims[2] remains contiguous/unsplit. Therefore, the small size of gdims[2] should not invalidate the decomposition.