[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1
Open
Shukla-Gaurav wants to merge 11 commits intomainfrom
Open
[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1Shukla-Gaurav wants to merge 11 commits intomainfrom
Shukla-Gaurav wants to merge 11 commits intomainfrom
Conversation
Introduces a new MIPS dialect that acts as a semantic abstraction layer for hardware-specific matrix multiply operations inside the IREE compiler. Key components: - IR/MIPSBase.td / MIPSDialect.h/.cpp: dialect definition, namespace ::mlir::iree_compiler::IREE::MIPS, dependent on func/memref/tensor. - IR/MIPSOps.td / MIPSOps.h/.cpp: mips.matmul op, tensor-only, Destination-Passing-Style (DPS). Verifier checks 2-D tensor shapes (lhs[MxK], rhs[KxN], init[MxN]) and element-type consistency. Implements ReifyRankedShapedTypeOpInterface and MemoryEffectsOpInterface. - IR/MIPSBufferizableOpInterface.cpp: eliminates mips.matmul entirely during One-Shot Bufferize by emitting func.call @my_matmul_kernel directly with the memref (base_ptr, offset, stride0, stride1) ABI. Uses memref.memory_space_cast to strip IREE HAL memory spaces from the base pointers so the function declaration stays stable across eraseHALDescriptorTypeFromMemRef. - Transforms/Passes.td/.h/.cpp: pass registry for LowerMIPSToFuncCallPass (now a no-op; kept for pipeline compatibility since bufferization handles the lowering directly). - Transforms/LowerMIPSToFuncCall.cpp: no-op pass stub. - DispatchCreation/FormDispatchRegions.cpp: teach IREE's dispatch formation to treat mips.matmul as a compute-heavy op eligible for outlining into flow.dispatch.workgroups. - Tools/init_iree_dialects.h: register MIPSDialect and its BufferizableOpInterface external models. - Tools/init_iree_passes.h: register MIPS passes with the global pass registry.
Adds the frontend conversion from torch.aten.mm to mips.matmul and wires the MIPS dialect into the IREE LLVMCPU codegen pipeline. Torch InputConversion changes: - ConvertTorchToMIPS.cpp: ConvertAtenMmToMIPSMatmul pattern rewrites torch.aten.mm to mips.matmul with a zero-initialized init tensor (bufferization.alloc_tensor). The pass runs before the standard ConvertTorchToLinalgPass so mips.matmul takes precedence over the generic linalg.matmul path. - Passes.td / Passes.h / Passes.cpp: declare ConvertTorchToMIPSPass and add it to the torch-to-iree pipeline under the use-mips-matmul option. - test/convert_torch_to_mips.mlir: FileCheck test verifying that torch.aten.mm is replaced by mips.matmul after the pass. LLVMCPU codegen pipeline changes: - Passes.cpp: insert LowerMIPSToFuncCallPass (no-op stub) in the post-bufferize section of buildLLVMCPUCodegenPassPipeline. The actual lowering to func.call is performed during One-Shot Bufferize by MIPSBufferizableOpInterface; this stub ensures the pass slot is reserved for future use and keeps the pipeline definition explicit. - CMakeLists.txt: add iree_compiler_Dialect_MIPS_Transforms_Transforms dependency to the LLVMCPU codegen target.
Provides the runtime kernel that backs mips.matmul dispatches. The kernel is packaged as an IREE executable plugin (implements iree_hal_executable_plugin_query) rather than a plain shared library because IREE's LLVMCPU system-dylib dispatch format resolves external function references through an internal import table (not standard ELF dynamic linking). The plugin's resolve() callback maps the symbol name "my_matmul_kernel" to the import-ABI wrapper. Usage: iree-run-module --executable_plugin=libmy_matmul_kernel.dylib ...
Provides two convenience scripts for exercising the full
torch.aten.mm → mips.matmul → func.call → vmfb → kernel plugin pipeline.
compile_mips.zsh:
Step 1 — verifies torch.aten.mm is converted to mips.matmul by the
ConvertTorchToMIPSPass (iree-opt smoke check).
Step 2 — runs the full torch-to-iree pipeline with use-mips-matmul=true,
producing the IREE input IR (/tmp/mm_iree.mlir).
Step 3 — compiles the IREE input IR to a vmfb (/tmp/mm_mips.vmfb)
with --mlir-print-ir-after-all, writing a per-pass IR dump
to /tmp/mm_mips_ir_dump.mlir for debugging.
run_mips.zsh:
Runs iree-run-module with --executable_plugin pointing at the
built libmy_matmul_kernel.dylib. Tests A * I = A (matrix multiplied
by identity) and prints the result for visual verification.
Add a global llvm::cl::opt<bool> flag `--iree-mips-static-embedding` to
iree-compile. When set, MIPSBufferizableOpInterface stamps the emitted
@my_matmul_kernel declaration with {hal.import.static}, causing
ConvertToLLVM to emit a direct linker-resolved call instead of a dynamic
HAL import table entry. Without the flag the call goes through the HAL
import table, allowing runtime resolution via --executable_plugin.
matmul_kernel.h — public API declaration (my_matmul_kernel) matmul_kernel.c — RVV + scalar compute; no IREE headers matmul_plugin.c — IREE HAL executable plugin interface only Also update MIPS_Dialect description in MIPSBase.td to reflect the semantic role of the dialect as an abstraction layer for dispatching to highly-optimized kernels, with op fusion as a key use case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mips.matmul is eliminated entirely during One-Shot Bufferize: MIPSBufferizableOpInterface emits func.call @my_matmul_kernel directly, so LowerMIPSToFuncCallPass was already a no-op placeholder. Also remove compile_mips.zsh and run_mips.zsh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- build_tools/riscv/setup_qemu_workflow.sh: One-time setup for toolchain (conda clang-18/lld-18), RISC-V sysroot, QEMU 8.2.2 (riscv64-linux-user from source), and both IREE builds (host x86 + RISC-V cross). Supports --step=N to run individual steps. - build_tools/riscv/rvv_qemu_workflow_static.sh: Clean 6-step static embedding pipeline — torch→flow IR, compile .o, create lld_wrapper, iree-compile with --iree-mips-static-embedding, ELF verification, and QEMU run with VLEN sweep. Supports --host and --vlen N. - build_tools/riscv/rvv_qemu_workflow_dynamic.sh: Parallel dynamic plugin pipeline — compile matmul_kernel.c + matmul_plugin.c into librvv_matmul.so, iree-compile without static embedding flag, and QEMU run with --executable_plugin. Supports --host and --vlen N. - runtime/src/iree/builtins/mips/README.md: Documents the kernel library layout, ABI, build commands for standalone/static/dynamic targets, and integration with the IREE MIPS dialect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- setup_qemu_workflow.sh step 4: add CMAKE_INSTALL_PREFIX, build
iree-tblgen alongside the other tools, and run install/fast so all
binaries land in iree-build/install/bin/.
- step 5: point IREE_HOST_BIN_DIR to the install tree
(iree-build/install/bin) instead of the raw build dir.
- rvv_qemu_workflow_{static,dynamic}.sh: align IREE_OPT, IREE_COMPILE,
and HOST_RUN to iree-build/install/bin/ accordingly.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a full end-to-end pipeline for dispatching
torch.aten.mmto a hand-tuned RVV-vectorized matmul kernel on RISC-V, using a new semantic
MIPS dialect as the abstraction layer.
Compiler Changes
mipsdialect: New semantic dialect withmips.matmulliving entirely inthe tensor domain. Enables higher-level transformations (op fusion, tiling,
kernel variant selection) before lowering.
ConvertTorchToMIPSPass: Interceptstorch.aten.mmbefore the standardtorch→linalg path and rewrites it as
mips.matmul.MIPSBufferizableOpInterface: Eliminatesmips.matmulduring One-ShotBufferize by decomposing 2-D memrefs via
memref.extract_strided_metadataand emitting a direct
func.call @my_matmul_kernel. No memref form ofmips.matmulis ever produced.--iree-mips-static-embedding(globalcl::opt): When set, stamps{hal.import.static}on the kernel declaration so the LLVMCPU backend emitsa direct linker-resolved call instead of routing through the HAL import table.
Runtime Kernel Library (
runtime/src/iree/builtins/mips/)matmul_kernel.{h,c}— RVV-vectorized compute kernel with scalar fallback;no IREE headers, usable for static
.o, dynamic.so, and standalone test.matmul_plugin.c— IREE HAL executable plugin interface wrapping the kernel.rvv_standalone_test.c— Standalone QEMU smoke-test (no IREE dependency).End-to-End Workflow Scripts (
build_tools/riscv/)setup_qemu_workflow.sh— One-time setup: conda toolchain (clang-18/lld-18),RISC-V sysroot, QEMU 8.2.2 from source, IREE host build with install, IREE
RISC-V cross-build.
rvv_qemu_workflow_static.sh— Static embedding pipeline: compiles kernelto
.o, injects it into the dispatch ELF atiree-compiletime via a customlld wrapper. No
--executable_pluginneeded at runtime.rvv_qemu_workflow_dynamic.sh— Dynamic plugin pipeline: buildslibrvv_matmul.so, runs with--executable_pluginunder QEMU.mips_matmul_test.mlir— Test functions (4×4 identity, 2×3×2 non-square,8×8 multi-register tiling) with expected outputs.
Key Design Notes
-Bsymbolicin the lld wrapper preventsR_RISCV_JUMP_SLOTentries in.rela.plt; IREE's embedded ELF loader only processes.rela.dyn(DT_RELA),not
.rela.plt(DT_JMPREL), which would cause a segfault on first call.