Skip to content

[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1

Open
Shukla-Gaurav wants to merge 11 commits intomainfrom
gaurav/mips_matmul_flow
Open

[MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows#1
Shukla-Gaurav wants to merge 11 commits intomainfrom
gaurav/mips_matmul_flow

Conversation

@Shukla-Gaurav
Copy link
Collaborator

@Shukla-Gaurav Shukla-Gaurav commented Mar 3, 2026

Summary

This PR introduces a full end-to-end pipeline for dispatching torch.aten.mm
to a hand-tuned RVV-vectorized matmul kernel on RISC-V, using a new semantic
MIPS dialect as the abstraction layer.

Compiler Changes

  • mips dialect: New semantic dialect with mips.matmul living entirely in
    the tensor domain. Enables higher-level transformations (op fusion, tiling,
    kernel variant selection) before lowering.

  • ConvertTorchToMIPSPass: Intercepts torch.aten.mm before the standard
    torch→linalg path and rewrites it as mips.matmul.

  • MIPSBufferizableOpInterface: Eliminates mips.matmul during One-Shot
    Bufferize by decomposing 2-D memrefs via memref.extract_strided_metadata
    and emitting a direct func.call @my_matmul_kernel. No memref form of
    mips.matmul is ever produced.

  • --iree-mips-static-embedding (global cl::opt): When set, stamps
    {hal.import.static} on the kernel declaration so the LLVMCPU backend emits
    a direct linker-resolved call instead of routing through the HAL import table.

Runtime Kernel Library (runtime/src/iree/builtins/mips/)

  • matmul_kernel.{h,c} — RVV-vectorized compute kernel with scalar fallback;
    no IREE headers, usable for static .o, dynamic .so, and standalone test.
  • matmul_plugin.c — IREE HAL executable plugin interface wrapping the kernel.
  • rvv_standalone_test.c — Standalone QEMU smoke-test (no IREE dependency).

End-to-End Workflow Scripts (build_tools/riscv/)

  • setup_qemu_workflow.sh — One-time setup: conda toolchain (clang-18/lld-18),
    RISC-V sysroot, QEMU 8.2.2 from source, IREE host build with install, IREE
    RISC-V cross-build.
  • rvv_qemu_workflow_static.sh — Static embedding pipeline: compiles kernel
    to .o, injects it into the dispatch ELF at iree-compile time via a custom
    lld wrapper. No --executable_plugin needed at runtime.
  • rvv_qemu_workflow_dynamic.sh — Dynamic plugin pipeline: builds
    librvv_matmul.so, runs with --executable_plugin under QEMU.
  • mips_matmul_test.mlir — Test functions (4×4 identity, 2×3×2 non-square,
    8×8 multi-register tiling) with expected outputs.

Key Design Notes

  • -Bsymbolic in the lld wrapper prevents R_RISCV_JUMP_SLOT entries in
    .rela.plt; IREE's embedded ELF loader only processes .rela.dyn (DT_RELA),
    not .rela.plt (DT_JMPREL), which would cause a segfault on first call.
  • Both workflows verified under QEMU with vlen=256 and vlen=512.

Gaurav Shukla and others added 11 commits March 3, 2026 15:09
Introduces a new MIPS dialect that acts as a semantic abstraction layer
for hardware-specific matrix multiply operations inside the IREE compiler.

Key components:

- IR/MIPSBase.td / MIPSDialect.h/.cpp: dialect definition, namespace
  ::mlir::iree_compiler::IREE::MIPS, dependent on func/memref/tensor.

- IR/MIPSOps.td / MIPSOps.h/.cpp: mips.matmul op, tensor-only,
  Destination-Passing-Style (DPS). Verifier checks 2-D tensor shapes
  (lhs[MxK], rhs[KxN], init[MxN]) and element-type consistency.
  Implements ReifyRankedShapedTypeOpInterface and MemoryEffectsOpInterface.

- IR/MIPSBufferizableOpInterface.cpp: eliminates mips.matmul entirely
  during One-Shot Bufferize by emitting func.call @my_matmul_kernel
  directly with the memref (base_ptr, offset, stride0, stride1) ABI.
  Uses memref.memory_space_cast to strip IREE HAL memory spaces from
  the base pointers so the function declaration stays stable across
  eraseHALDescriptorTypeFromMemRef.

- Transforms/Passes.td/.h/.cpp: pass registry for
  LowerMIPSToFuncCallPass (now a no-op; kept for pipeline compatibility
  since bufferization handles the lowering directly).

- Transforms/LowerMIPSToFuncCall.cpp: no-op pass stub.

- DispatchCreation/FormDispatchRegions.cpp: teach IREE's dispatch
  formation to treat mips.matmul as a compute-heavy op eligible for
  outlining into flow.dispatch.workgroups.

- Tools/init_iree_dialects.h: register MIPSDialect and its
  BufferizableOpInterface external models.

- Tools/init_iree_passes.h: register MIPS passes with the global
  pass registry.
Adds the frontend conversion from torch.aten.mm to mips.matmul and
wires the MIPS dialect into the IREE LLVMCPU codegen pipeline.

Torch InputConversion changes:

- ConvertTorchToMIPS.cpp: ConvertAtenMmToMIPSMatmul pattern rewrites
  torch.aten.mm to mips.matmul with a zero-initialized init tensor
  (bufferization.alloc_tensor). The pass runs before the standard
  ConvertTorchToLinalgPass so mips.matmul takes precedence over the
  generic linalg.matmul path.

- Passes.td / Passes.h / Passes.cpp: declare ConvertTorchToMIPSPass and
  add it to the torch-to-iree pipeline under the use-mips-matmul option.

- test/convert_torch_to_mips.mlir: FileCheck test verifying that
  torch.aten.mm is replaced by mips.matmul after the pass.

LLVMCPU codegen pipeline changes:

- Passes.cpp: insert LowerMIPSToFuncCallPass (no-op stub) in the
  post-bufferize section of buildLLVMCPUCodegenPassPipeline. The actual
  lowering to func.call is performed during One-Shot Bufferize by
  MIPSBufferizableOpInterface; this stub ensures the pass slot is
  reserved for future use and keeps the pipeline definition explicit.

- CMakeLists.txt: add iree_compiler_Dialect_MIPS_Transforms_Transforms
  dependency to the LLVMCPU codegen target.
Provides the runtime kernel that backs mips.matmul dispatches.

The kernel is packaged as an IREE executable plugin (implements
iree_hal_executable_plugin_query) rather than a plain shared library
because IREE's LLVMCPU system-dylib dispatch format resolves external
function references through an internal import table (not standard ELF
dynamic linking). The plugin's resolve() callback maps the symbol name
"my_matmul_kernel" to the import-ABI wrapper.

Usage:
  iree-run-module --executable_plugin=libmy_matmul_kernel.dylib ...
Provides two convenience scripts for exercising the full
torch.aten.mm → mips.matmul → func.call → vmfb → kernel plugin pipeline.

compile_mips.zsh:
  Step 1 — verifies torch.aten.mm is converted to mips.matmul by the
            ConvertTorchToMIPSPass (iree-opt smoke check).
  Step 2 — runs the full torch-to-iree pipeline with use-mips-matmul=true,
            producing the IREE input IR (/tmp/mm_iree.mlir).
  Step 3 — compiles the IREE input IR to a vmfb (/tmp/mm_mips.vmfb)
            with --mlir-print-ir-after-all, writing a per-pass IR dump
            to /tmp/mm_mips_ir_dump.mlir for debugging.

run_mips.zsh:
  Runs iree-run-module with --executable_plugin pointing at the
  built libmy_matmul_kernel.dylib. Tests A * I = A (matrix multiplied
  by identity) and prints the result for visual verification.
Add a global llvm::cl::opt<bool> flag `--iree-mips-static-embedding` to
iree-compile. When set, MIPSBufferizableOpInterface stamps the emitted
@my_matmul_kernel declaration with {hal.import.static}, causing
ConvertToLLVM to emit a direct linker-resolved call instead of a dynamic
HAL import table entry. Without the flag the call goes through the HAL
import table, allowing runtime resolution via --executable_plugin.
  matmul_kernel.h   — public API declaration (my_matmul_kernel)
  matmul_kernel.c   — RVV + scalar compute; no IREE headers
  matmul_plugin.c   — IREE HAL executable plugin interface only

Also update MIPS_Dialect description in MIPSBase.td to reflect the
semantic role of the dialect as an abstraction layer for dispatching
to highly-optimized kernels, with op fusion as a key use case.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
mips.matmul is eliminated entirely during One-Shot Bufferize:
MIPSBufferizableOpInterface emits func.call @my_matmul_kernel directly,
so LowerMIPSToFuncCallPass was already a no-op placeholder.

Also remove compile_mips.zsh and run_mips.zsh.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- build_tools/riscv/setup_qemu_workflow.sh: One-time setup for
  toolchain (conda clang-18/lld-18), RISC-V sysroot, QEMU 8.2.2
  (riscv64-linux-user from source), and both IREE builds (host x86 +
  RISC-V cross).  Supports --step=N to run individual steps.

- build_tools/riscv/rvv_qemu_workflow_static.sh: Clean 6-step static
  embedding pipeline — torch→flow IR, compile .o, create lld_wrapper,
  iree-compile with --iree-mips-static-embedding, ELF verification, and
  QEMU run with VLEN sweep.  Supports --host and --vlen N.

- build_tools/riscv/rvv_qemu_workflow_dynamic.sh: Parallel dynamic
  plugin pipeline — compile matmul_kernel.c + matmul_plugin.c into
  librvv_matmul.so, iree-compile without static embedding flag, and
  QEMU run with --executable_plugin.  Supports --host and --vlen N.

- runtime/src/iree/builtins/mips/README.md: Documents the kernel
  library layout, ABI, build commands for standalone/static/dynamic
  targets, and integration with the IREE MIPS dialect.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- setup_qemu_workflow.sh step 4: add CMAKE_INSTALL_PREFIX, build
  iree-tblgen alongside the other tools, and run install/fast so all
  binaries land in iree-build/install/bin/.
- step 5: point IREE_HOST_BIN_DIR to the install tree
  (iree-build/install/bin) instead of the raw build dir.
- rvv_qemu_workflow_{static,dynamic}.sh: align IREE_OPT, IREE_COMPILE,
  and HOST_RUN to iree-build/install/bin/ accordingly.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@Shukla-Gaurav Shukla-Gaurav changed the title Add a custom path for matrix multiplication to leverage fast kernel library [MIPS] Add MIPS dialect with RVV kernel, static/dynamic dispatch, and QEMU workflows Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant