Memory Access Streaming Fusion Pass#268
Merged
ShangkunLi merged 3 commits intocoredac:mainfrom Feb 12, 2026
Merged
Conversation
tancheng
approved these changes
Feb 11, 2026
guosran
reviewed
Feb 12, 2026
| // shape). | ||
| if (writer->write_memrefs.size() == 1 && reader->read_memrefs.size() == 1) { | ||
| benefit += 50; | ||
| } |
Collaborator
There was a problem hiding this comment.
This is a simple calculation, will it result in a lot of ties?
Collaborator
Author
There was a problem hiding this comment.
Yes, this may introduce some ties.
We use a greedy-based fusion here; we actually fuse all the tasks that meet the constraints.
I can not tell the effect of having ties for now. Maybe we can do some tests by applying it to more benchmarks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
--memory-access-streaming-fusion: Memory Access Streaming Fusion PassSummary
This PR adds the
MemoryAccessStreamingFusionpass (--memory-access-streaming-fusion), which identifies and fusestaskflow.taskoperations connected by intermediate memory buffers. When one task writes to a memref and another reads from it, the pass merges them into a single fused task, eliminating the intermediatememref.alloc(if have) and converting the memory-based data transfer into direct SSA value flow (streaming).Motivation
After
convert-affine-to-taskflow, each serialized loop nest becomes an independent task that communicates with other tasks through shared memrefs. Many of these intermediate buffers exist solely to pass data between producer and consumer tasks. Fusing these tasks:How It Works
The pass operates in iterative rounds to handle fusion chains (e.g., A→B→C: first round fuses A+B, second round fuses (A+B)+C):
Dependency Analysis — Traces SSA value flow through
write_outputs→read/write_memrefsto build a memory dependency graph capturing RAW, WAW, and WAR dependencies. Usesoriginal_read/write_memrefsto identify the physical intermediate%allocbuffers.Candidate Identification — Finds fusable (writer, reader) pairs that satisfy:
value_outputs(simplified constraint for correctness).Fusion Transformation — For each valid candidate:
memref.alloc.Example
Before (3 tasks, 2 intermediate buffers):
After (1 fused task, intermediate
%alloc_Aeliminated):Iterative Chaining Example (ResNet)
On the SimpleResNet benchmark, the pass performs iterative fusion across multiple rounds:
Task_4(transpose) +Task_5(clamp)Task_4_Task_5_fusedTask_10(transpose) +Task_11(add)Task_10_Task_11_fusedTask_10_Task_11_fused+Task_12(clamp)Task_10_Task_11_Task_12_fused_fusedThis reduces the ResNet task graph from 13 tasks → 10 tasks, eliminating 3 intermediate buffers.
Fusion Criteria
value_outputsaffine.forloop bounds