Skip to content

Feat/gpu augmentations#15

Open
etienne87 wants to merge 4 commits intoMIC-DKFZ:masterfrom
etienne87:feat/gpu_augmentations
Open

Feat/gpu augmentations#15
etienne87 wants to merge 4 commits intoMIC-DKFZ:masterfrom
etienne87:feat/gpu_augmentations

Conversation

@etienne87
Copy link
Copy Markdown

In an effort for GPU Augmentation (See Issue), i fix a few transforms to take into account the device of the image argument.

@FabianIsensee
Copy link
Copy Markdown
Member

Thanks for the PR. I am wondering a bit when you might want to use GPU augmentation. We have so far not found it useful and prefer using the GPU for the actual training workload.
Changing just a couple of transforms to be GPU compatible might be of limited usefulness. Are you planning to extend this work to cover all transforms? How would you then treat transforms that have dedicated numpy/scipy paths because those turned out to be faster in my CPU-focused testing

@etienne87
Copy link
Copy Markdown
Author

etienne87 commented Mar 9, 2026

Hello Fabian, thanks for the feedback!

GPU augmentation can be useful when server CPU resources are limited and volumes are large — the throughput gain depends heavily on the hardware setup. In my own testing, when the server is lightly loaded, CPU augmentation is equivalent or faster. The benefit appears when the server is under load — GPU augmentation remains fast while CPU throughput degrades due to contention.
Also some users want to do "data-augmentation-on-the-fly" once the batch is already loaded on GPU. For instance training for equivariance.

I focused initially on the transforms used in the classic 3d_fullres configuration as a test case. Extending to all transforms is a natural next step.

Regarding the numpy/scipy paths — could you point me to specific cases where those outperformed the GPU? For transforms like grid_sample, the CPU PyTorch path is serial across voxels (see GridSample.cpp .

@FabianIsensee
Copy link
Copy Markdown
Member

Hey etienne, when mentioned that scipy/numpy paths are faster this was in the context of CPU data augmentation, not GPU. I agree the GPU augmentation is always faster, but I have yet to see a convincing application of that. In my understanding, it is better to configure servers properly and make adjustments to the data augmentation pipeline (such as switching to nearest neighbor resampling for segmentation) where needed rather than spending precious GPU time on data augmentation. CPU is cheap in comparison.
If we allow for GPU augmentation in batchgenerators (and I don't see why we shouldn't make this possible) then I would like to make sure we are not changing any of the CPU compute paths and keep everything backwards compatible. The current PR also has a bug, see my comments above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants