Skip to content

copy/ascend: add h2d ffts pipeline cases#2

Open
NaganooMei wants to merge 14 commits into
mag1c-h:mainfrom
NaganooMei:upstream-yuanrong-pipeline
Open

copy/ascend: add h2d ffts pipeline cases#2
NaganooMei wants to merge 14 commits into
mag1c-h:mainfrom
NaganooMei:upstream-yuanrong-pipeline

Conversation

@NaganooMei

@NaganooMei NaganooMei commented May 26, 2026

Copy link
Copy Markdown

Summary

  • Add optional Ascend H2D FFTS pipeline copy cases:
    • host_to_device_ffts_pipeline
    • one_host_to_all_device_ffts_pipeline
    • all_host_to_all_device_ffts_pipeline
  • Add shared-host fan-out cases for multi-device reads:
    • one_share_host_to_all_device_ce
    • one_share_host_to_all_device_ce_multi_stream
    • one_share_host_to_all_device_ffts_pipeline
  • Change Ascend multi-device all-host cases to fork one child per device for concurrent submit.
  • Add forked result collection that merges child timing samples by per-iteration max cost and reports aggregate count/bandwidth.
  • Add FragmentedDeviceCopyBuffer, isolated ascend/h2d_ffts_pipeline implementation, and FFTS SDMA dispatcher.
  • Update readme.md with shared-host, multi-stream, fork-submit, and FFTS pipeline cases.

Scope

  • This PR is limited to Ascend copy benchmark cases and the common runtime/result plumbing needed by those cases.
  • one_host_to_all_device_* remains the ordinary aclrtMallocHost host0 path.
  • one_share_host_to_all_device_* is the POSIX shared-memory path.
  • Fork-submit cases skip parent ACL runtime initialization and create runtime state inside each child process.

Validation

COPY_FFTS_VALIDATE=0 COPY_FFTS_PIPELINE_OBJECT_FRAGS=64 ./build/module/copy/copy -t all_host_to_all_device_ffts_pipeline -s 32K -n 1024 -i 128 -d 8
COPY_FFTS_VALIDATE=0 COPY_FFTS_PIPELINE_OBJECT_FRAGS=64 ./build/module/copy/copy -t one_host_to_all_device_ffts_pipeline -s 32K -n 1024 -i 128 -d 8
COPY_FFTS_VALIDATE=0 COPY_FFTS_PIPELINE_OBJECT_FRAGS=64 ./build/module/copy/copy -t host_to_device_ffts_pipeline -s 32K -n 1024 -i 128 -d 8

Observed:

case Copy avg BW
all_host_to_all_device_ffts_pipeline 1912 us 130.753 GB/s
one_host_to_all_device_ffts_pipeline 2929 us 85.353 GB/s
host_to_device_ffts_pipeline 1395-1575 us per device 19.841-22.401 GB/s per device

@NaganooMei NaganooMei force-pushed the upstream-yuanrong-pipeline branch from 9b2eb80 to ba46530 Compare May 28, 2026 05:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant