[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows by Shukla-Gaurav · Pull Request #1 · MIPS/iree

Shukla-Gaurav · 2026-03-03T09:54:41Z

Summary

This PR introduces a full end-to-end pipeline for dispatching torch.aten.mm
to a hand-tuned RVV-vectorized matmul kernel on RISC-V, using a new semantic
MIPS dialect as the abstraction layer.

Compiler Changes

mips dialect: New semantic dialect with mips.matmul living entirely in
the tensor domain. Enables higher-level transformations (op fusion, tiling,
kernel variant selection) before lowering.
ConvertTorchToMIPSPass: Intercepts torch.aten.mm before the standard
torch→linalg path and rewrites it as mips.matmul.
MIPSBufferizableOpInterface: Eliminates mips.matmul during One-Shot
Bufferize by decomposing 2-D memrefs via memref.extract_strided_metadata
and emitting a direct func.call @my_matmul_kernel. No memref form of
mips.matmul is ever produced.
--iree-mips-static-embedding (global cl::opt): When set, stamps
{hal.import.static} on the kernel declaration so the LLVMCPU backend emits
a direct linker-resolved call instead of routing through the HAL import table.

Runtime Kernel Library (`runtime/src/iree/builtins/mips/`)

matmul_kernel.{h,c} — RVV-vectorized compute kernel with scalar fallback;
no IREE headers, usable for static .o, dynamic .so, and standalone test.
matmul_plugin.c — IREE HAL executable plugin interface wrapping the kernel.
rvv_standalone_test.c — Standalone QEMU smoke-test (no IREE dependency).

End-to-End Workflow Scripts (`build_tools/riscv/`)

setup_qemu_workflow.sh — One-time setup: conda toolchain (clang-18/lld-18),
RISC-V sysroot, QEMU 8.2.2 from source, IREE host build with install, IREE
RISC-V cross-build.
rvv_qemu_workflow_static.sh — Static embedding pipeline: compiles kernel
to .o, injects it into the dispatch ELF at iree-compile time via a custom
lld wrapper. No --executable_plugin needed at runtime.
rvv_qemu_workflow_dynamic.sh — Dynamic plugin pipeline: builds
librvv_matmul.so, runs with --executable_plugin under QEMU.
mips_matmul_test.mlir — Test functions (4×4 identity, 2×3×2 non-square,
8×8 multi-register tiling) with expected outputs.

Key Design Notes

-Bsymbolic in the lld wrapper prevents R_RISCV_JUMP_SLOT entries in
.rela.plt; IREE's embedded ELF loader only processes .rela.dyn (DT_RELA),
not .rela.plt (DT_JMPREL), which would cause a segfault on first call.
Both workflows verified under QEMU with vlen=256 and vlen=512.

Introduces a new MIPS dialect that acts as a semantic abstraction layer for hardware-specific matrix multiply operations inside the IREE compiler. Key components: - IR/MIPSBase.td / MIPSDialect.h/.cpp: dialect definition, namespace ::mlir::iree_compiler::IREE::MIPS, dependent on func/memref/tensor. - IR/MIPSOps.td / MIPSOps.h/.cpp: mips.matmul op, tensor-only, Destination-Passing-Style (DPS). Verifier checks 2-D tensor shapes (lhs[MxK], rhs[KxN], init[MxN]) and element-type consistency. Implements ReifyRankedShapedTypeOpInterface and MemoryEffectsOpInterface. - IR/MIPSBufferizableOpInterface.cpp: eliminates mips.matmul entirely during One-Shot Bufferize by emitting func.call @my_matmul_kernel directly with the memref (base_ptr, offset, stride0, stride1) ABI. Uses memref.memory_space_cast to strip IREE HAL memory spaces from the base pointers so the function declaration stays stable across eraseHALDescriptorTypeFromMemRef. - Transforms/Passes.td/.h/.cpp: pass registry for LowerMIPSToFuncCallPass (now a no-op; kept for pipeline compatibility since bufferization handles the lowering directly). - Transforms/LowerMIPSToFuncCall.cpp: no-op pass stub. - DispatchCreation/FormDispatchRegions.cpp: teach IREE's dispatch formation to treat mips.matmul as a compute-heavy op eligible for outlining into flow.dispatch.workgroups. - Tools/init_iree_dialects.h: register MIPSDialect and its BufferizableOpInterface external models. - Tools/init_iree_passes.h: register MIPS passes with the global pass registry.

Adds the frontend conversion from torch.aten.mm to mips.matmul and wires the MIPS dialect into the IREE LLVMCPU codegen pipeline. Torch InputConversion changes: - ConvertTorchToMIPS.cpp: ConvertAtenMmToMIPSMatmul pattern rewrites torch.aten.mm to mips.matmul with a zero-initialized init tensor (bufferization.alloc_tensor). The pass runs before the standard ConvertTorchToLinalgPass so mips.matmul takes precedence over the generic linalg.matmul path. - Passes.td / Passes.h / Passes.cpp: declare ConvertTorchToMIPSPass and add it to the torch-to-iree pipeline under the use-mips-matmul option. - test/convert_torch_to_mips.mlir: FileCheck test verifying that torch.aten.mm is replaced by mips.matmul after the pass. LLVMCPU codegen pipeline changes: - Passes.cpp: insert LowerMIPSToFuncCallPass (no-op stub) in the post-bufferize section of buildLLVMCPUCodegenPassPipeline. The actual lowering to func.call is performed during One-Shot Bufferize by MIPSBufferizableOpInterface; this stub ensures the pass slot is reserved for future use and keeps the pipeline definition explicit. - CMakeLists.txt: add iree_compiler_Dialect_MIPS_Transforms_Transforms dependency to the LLVMCPU codegen target.

Provides the runtime kernel that backs mips.matmul dispatches. The kernel is packaged as an IREE executable plugin (implements iree_hal_executable_plugin_query) rather than a plain shared library because IREE's LLVMCPU system-dylib dispatch format resolves external function references through an internal import table (not standard ELF dynamic linking). The plugin's resolve() callback maps the symbol name "my_matmul_kernel" to the import-ABI wrapper. Usage: iree-run-module --executable_plugin=libmy_matmul_kernel.dylib ...

Provides two convenience scripts for exercising the full torch.aten.mm → mips.matmul → func.call → vmfb → kernel plugin pipeline. compile_mips.zsh: Step 1 — verifies torch.aten.mm is converted to mips.matmul by the ConvertTorchToMIPSPass (iree-opt smoke check). Step 2 — runs the full torch-to-iree pipeline with use-mips-matmul=true, producing the IREE input IR (/tmp/mm_iree.mlir). Step 3 — compiles the IREE input IR to a vmfb (/tmp/mm_mips.vmfb) with --mlir-print-ir-after-all, writing a per-pass IR dump to /tmp/mm_mips_ir_dump.mlir for debugging. run_mips.zsh: Runs iree-run-module with --executable_plugin pointing at the built libmy_matmul_kernel.dylib. Tests A * I = A (matrix multiplied by identity) and prints the result for visual verification.

Add a global llvm::cl::opt<bool> flag `--iree-mips-static-embedding` to iree-compile. When set, MIPSBufferizableOpInterface stamps the emitted @my_matmul_kernel declaration with {hal.import.static}, causing ConvertToLLVM to emit a direct linker-resolved call instead of a dynamic HAL import table entry. Without the flag the call goes through the HAL import table, allowing runtime resolution via --executable_plugin.

matmul_kernel.h — public API declaration (my_matmul_kernel) matmul_kernel.c — RVV + scalar compute; no IREE headers matmul_plugin.c — IREE HAL executable plugin interface only Also update MIPS_Dialect description in MIPSBase.td to reflect the semantic role of the dialect as an abstraction layer for dispatching to highly-optimized kernels, with op fusion as a key use case. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mips.matmul is eliminated entirely during One-Shot Bufferize: MIPSBufferizableOpInterface emits func.call @my_matmul_kernel directly, so LowerMIPSToFuncCallPass was already a no-op placeholder. Also remove compile_mips.zsh and run_mips.zsh. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- build_tools/riscv/setup_qemu_workflow.sh: One-time setup for toolchain (conda clang-18/lld-18), RISC-V sysroot, QEMU 8.2.2 (riscv64-linux-user from source), and both IREE builds (host x86 + RISC-V cross). Supports --step=N to run individual steps. - build_tools/riscv/rvv_qemu_workflow_static.sh: Clean 6-step static embedding pipeline — torch→flow IR, compile .o, create lld_wrapper, iree-compile with --iree-mips-static-embedding, ELF verification, and QEMU run with VLEN sweep. Supports --host and --vlen N. - build_tools/riscv/rvv_qemu_workflow_dynamic.sh: Parallel dynamic plugin pipeline — compile matmul_kernel.c + matmul_plugin.c into librvv_matmul.so, iree-compile without static embedding flag, and QEMU run with --executable_plugin. Supports --host and --vlen N. - runtime/src/iree/builtins/mips/README.md: Documents the kernel library layout, ABI, build commands for standalone/static/dynamic targets, and integration with the IREE MIPS dialect. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- setup_qemu_workflow.sh step 4: add CMAKE_INSTALL_PREFIX, build iree-tblgen alongside the other tools, and run install/fast so all binaries land in iree-build/install/bin/. - step 5: point IREE_HOST_BIN_DIR to the install tree (iree-build/install/bin) instead of the raw build dir. - rvv_qemu_workflow_{static,dynamic}.sh: align IREE_OPT, IREE_COMPILE, and HOST_RUN to iree-build/install/bin/ accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Gaurav Shukla and others added 11 commits March 3, 2026 15:09

[MIPS] Add iree-build.zsh build configuration script

c14a0d2

[MIPS] Add test MLIR and fix dynamic workflow message

31153e2

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Shukla-Gaurav changed the title ~~Add a custom path for matrix multiplication to leverage fast kernel library~~ [MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1

[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1
Shukla-Gaurav wants to merge 11 commits intomainfrom
gaurav/mips_matmul_flow

Shukla-Gaurav commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Shukla-Gaurav commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Compiler Changes

Runtime Kernel Library (runtime/src/iree/builtins/mips/)

End-to-End Workflow Scripts (build_tools/riscv/)

Key Design Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Shukla-Gaurav commented Mar 3, 2026 •

edited

Loading

Runtime Kernel Library (`runtime/src/iree/builtins/mips/`)

End-to-End Workflow Scripts (`build_tools/riscv/`)