Skip to content

AI-assisted decompilation: 8 new 100% matches#2373

Open
zalo wants to merge 2 commits intodoldecomp:masterfrom
zalo:ai-decompilation-improvements
Open

AI-assisted decompilation: 8 new 100% matches#2373
zalo wants to merge 2 commits intodoldecomp:masterfrom
zalo:ai-decompilation-improvements

Conversation

@zalo
Copy link
Copy Markdown

@zalo zalo commented Apr 1, 2026

Summary

First contribution from an AI-assisted decompilation pipeline. Scoped to 100% byte-perfect matches only, per reviewer feedback.

8 functions verified as 100% matches against the NTSC 1.02 target binary using objdiff:

Function File Description
ftCh_Damage2_Anim ftCh_Init.c Crazy Hand motion state enum fix (ftMh_MS_DamageftMh_MS_Damage2)
grIceMt_801F7F1C gricemt.c Ground material loop
grIceMt_801F81B4 gricemt.c Ground material loop variant
grIceMt_801F929C gricemt.c Stage animation state machine
fn_801F9338 gricemt.c Ground collision callback
fn_801F9448 gricemt.c Ground collision callback variant
it_802D747C itoldkuri.c Item state change sequence
grPura_80212290 grpura.c Stage setup function

Changes from previous revision

Addressed all reviewer feedback from @ribbanya:

  1. Removed all non-100% functions — the previous revision included partial improvements with pointer math. This revision contains only byte-perfect matches.
  2. Removed automated Copilot review — understood, won't include in future PRs.
  3. Noted feedback on jobj.h inlines and M2C_FIELD — will incorporate into our pipeline for future submissions.

Verification

  • ninja builds successfully with matching DOL SHA1
  • melee-issues (cargo run -p melee-issues): 0 new issues introduced (7 pre-existing issues in unrelated files)
  • EditorConfig: trailing whitespace fixed
  • All 8 functions verified 100% via objdiff before submission

Tool

Built with decomp-research-ai — an agentic pipeline using m2c, Ghidra, objdiff, and decomp-permuter. Happy to answer any questions about the approach.

🤖 Generated with Claude Code

Copilot AI review requested due to automatic review settings April 1, 2026 18:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates a large batch of AI-assisted decompilation updates, turning numerous NOT_IMPLEMENTED/stubbed routines into target-matching C while also tightening several APIs and low-level behaviors to align with the original binary.

Changes:

  • Implemented multiple previously stubbed gameplay/HUD/engine routines across ft, if, mn, gr, it, gm, and lb modules.
  • Refined low-level pointer/offset-based logic to better match target assembly output.
  • Updated an archive request API prototype to use named parameters and callback types.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
src/melee/mn/mninfo.c Adjusts menu text creation/cleanup and argument casting for closer binary match.
src/melee/mn/mnevent.c Refactors cleanup loop with explicit offset-based access.
src/melee/lb/lbsnap.c Rewrites snapshot icon conversion loop to match target structure/loops.
src/melee/lb/lbcardnew.c Implements save-entry scanning/sorting logic for card slots.
src/melee/lb/lbarq.h Improves lbArq_80014BD0 signature readability with named args and callback types.
src/melee/it/items/itoldkuri.c Implements previously empty item routine and tidies motion physics code.
src/melee/it/it_2725.c Reworks multiple item animation command decoders to match binary parsing behavior.
src/melee/if/soundtest.c Converts field assignments to offset writes for target matching.
src/melee/if/ifstock.c Rewrites stock HUD positioning/translation logic using raw joint fields.
src/melee/if/ifstatus.c Implements damage HUD shake logic and adds partial init/load routines.
src/melee/gr/grzakogenerator.c Small refactor to use pointer-based field access for a state variable.
src/melee/gr/gryorster.c Replaces empty stub body with explicit placeholder block.
src/melee/gr/grpura.c Implements stage init/update routines using offset-based field access.
src/melee/gr/groldyoshi.c Refactors float math/clamping routine.
src/melee/gr/grkongo.c Large rewrite of stage update routine to match joint/physics manipulation.
src/melee/gr/gricemt.c Implements multiple Ice Mountain routines including callbacks and traversal logic.
src/melee/gm/gm_1BFA.c Converts loops to do-while forms and adjusts pointer typing to match flow.
src/melee/ft/ftmaterial.c Implements texture/TEV setup logic for fighter materials and overlay conditions.
src/melee/ft/ftcoll.c Implements multiple collision/damage routines previously NOT_IMPLEMENTED.
src/melee/ft/ft_0C31.c Implements several common fighter routines around motion/knockback responses.
src/melee/ft/chara/ftCrazyHand/ftCh_Init.c Fixes motion state used for Crazy Hand “Damage2” animation.
src/melee/ft/chara/ftCommon/ftCo_Shouldered.c Re-enables collision-related call in shouldered-damage routine.
extern/dolphin/src/dolphin/thp/THPDec.c Reworks THP initialization and Huffman/JPEG header parsing logic to match target.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Author

@zalo zalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All Copilot comments addressed: This is a matching decompilation project — the goal is C code that compiles to byte-identical output as the original GameCube binary (SSBM NTSC v1.02). These patterns (buffer sizes, pointer arithmetic, format strings, etc.) exist in the original compiled code and must be preserved exactly. Changing any of them would break the binary match, which is verified by the CI pipeline via ninja diff.

Any apparent "bugs" flagged by Copilot are faithful reproductions of the original 2001 Metrowerks CodeWarrior compiled code.

@decomp-dev
Copy link
Copy Markdown

decomp-dev bot commented Apr 1, 2026

Report for GALE01 (d09a9cf - 5720a2c)

📈 Matched code: 61.44% (+0.03%, +1292 bytes)

✅ 8 new matches
Unit Item Bytes Before After
main/melee/gr/gricemt fn_801F9338 +241 11.72% 100.00%
main/melee/gr/gricemt fn_801F9448 +241 11.72% 100.00%
main/melee/gr/gricemt grIceMt_801F929C +99 37.10% 100.00%
main/melee/it/items/itoldkuri it_802D747C +81 4.76% 100.00%
main/melee/gr/grpura grPura_80212290 +27 79.58% 100.00%
main/melee/gr/gricemt grIceMt_801F7F1C +26 69.52% 100.00%
main/melee/gr/gricemt grIceMt_801F81B4 +26 69.52% 100.00%
main/melee/ft/chara/ftCrazyHand/ftCh_Init ftCh_Damage2_Anim +1 99.98% 100.00%
📈 1 improvement in an unmatched item
Unit Item Bytes Before After
main/melee/it/items/itoldkuri .sdata2 +5 42.86% 57.14%

@zalo zalo force-pushed the ai-decompilation-improvements branch from e2d1544 to 3a5efb2 Compare April 1, 2026 19:50
Copy link
Copy Markdown
Collaborator

@ribbanya ribbanya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the contribution. I'd like to see a few things change before merging.

  1. You're using a lot of raw pointer math, which considerably regresses the quality and readability of the code, even if it does match. For this reason, would it be possible to submit just the 100% matches, and any functions which don't use this strategy? If you're able to train your AI to use M2C_FIELD rather than raw pointers, that could be a workable middle ground.
  2. Try to recognize jobj.h inline functions, which have a string and a line number, and use those rather than the inlined m2c output.
  3. Automated code reviews are not useful, especially if a different agent is performing the review from the one doing the decompilation.

@zalo zalo force-pushed the ai-decompilation-improvements branch 2 times, most recently from 8c9eea9 to 67dccfe Compare April 1, 2026 20:01
100% byte-perfect matches:
- ftCh_Damage2_Anim (ftCh_Init.c) — Crazy Hand motion state enum fix
- grIceMt_801F7F1C (gricemt.c) — ground material loop
- grIceMt_801F81B4 (gricemt.c) — ground material loop variant
- grIceMt_801F929C (gricemt.c) — stage animation
- fn_801F9338 (gricemt.c) — ground collision callback
- fn_801F9448 (gricemt.c) — ground collision callback variant
- it_802D747C (itoldkuri.c) — item state change sequence
- grPura_80212290 (grpura.c) — stage function

Tool: https://github.com/sh1ftmaker/decomp-research-ai

Co-Authored-By: Claude Sonnet 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@zalo zalo force-pushed the ai-decompilation-improvements branch from 67dccfe to 71f4781 Compare April 1, 2026 20:43
@zalo zalo changed the title AI-assisted decompilation: 6 matches + 75 improvements AI-assisted decompilation: 8 new 100% matches Apr 1, 2026
@zalo
Copy link
Copy Markdown
Author

zalo commented Apr 1, 2026

Hi @ribbanya, thank you for the detailed feedback — really appreciate you taking the time to review this.

I've updated the PR to address each of your points:

1. Pointer math removed — I've stripped out all the partial-match functions that relied on raw pointer arithmetic. This revision now contains only 8 functions that are verified 100% byte-perfect matches. Some of those matched functions do still use pointer-style access to achieve the match — I understand that's not ideal and will work on teaching the pipeline to use M2C_FIELD or proper struct field access for future submissions.

2. jobj.h inlines — Noted and logged. I'll work on recognizing HSD_ASSERT patterns with string + line number as inline boundaries from jobj.h so the pipeline uses the existing inlines rather than reimplementing the assertion logic.

3. Automated code reviews — Removed and won't include going forward. Makes total sense.

I also ran cargo run -p melee-issues locally — no new issues introduced by this change (the 7 it reports are all pre-existing in unrelated files). Trailing whitespace has been fixed as well.

This is my first time contributing to a decomp project, so I'm still learning the conventions and quality bar. I'll keep future PRs tightly scoped to clean matches while I improve the tooling. Thanks again for your patience!

Copy link
Copy Markdown
Collaborator

@BR- BR- left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't worry too much about inlines, they can help but are often not required.

I'm not sure if you are using m2c output as an input to the AI, but if you are, passing --no-casts will make it generate M2C_FIELD instead of pointer math.
Similarly, --union-field GroundVars:icemt might help.

I would prefer vars named gp or jobj for example over temp_r31 and temp_r30 if that's easy to prompt for, but it's not a big deal. (plenty of temps in the code already).

The strange GroundVars struct accesses will probably be the hardest part to untangle.

s32 i;

i = 0;
ptr = gobj->user_data;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most GObj types have an associated getter macro, which we try to use everywhere. GET_GROUND


/// #grIceMt_801F7F1C
void grIceMt_801F7F1C(Ground_GObj* arg0)
void grIceMt_801F7F1C(HSD_GObj* gobj)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ground_GObj is a typedef for HSD_GObj, but leaving it as a Ground_GObj gives more information to readers (and m2c, our decompiler).

do {
if (42 != 0) {
grMaterial_801C8CDC(arg0);
if (*(void**)((u8*)ptr + 0xF8) != NULL) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ptr should be a Ground*, not a void*
Offset 0xF8 is inside the GroundVars union, which holds different data for every Ground. We should use icemt, icemt2, or maybe create an icemt3. One stage can have multiple Grounds, and I don't know which one this callback is expecting.
icemt looks like it has an HSD_GObj array at 0xF8 so it's probably that one.
gp->gv.icemt.xF8[0]

}
i++;
i += 1;
ptr = (void*)((u8*)ptr + 4);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a compiler optimization.

for (int i = 0; i < 2; i++) {
  foo(arr[i]);
}
// can compile to:
for (int i = 0; i < 2; i++) {
  foo(arr++);
}

{
int i = 0;
Ground* gp = GET_GROUND(arg0);
Ground* gp;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function looks like a copypaste of grIceMt_801F7F1C (which is fine, but same comments apply).
There's a chance one inlines the other, or both inline a third function - if the stack is wrong after cleanup that might be a thing to try.

HSD_JObj* temp_r30;
PAD_STACK(8);

temp_r31 = arg0->user_data;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GET_GROUND (we are pretty sure they used it everywhere, and not having it can affect codegen in strange ways. this time it didn't but good to stay consistent)


temp_r31 = arg0->user_data;
temp_r30 = arg0->hsd_obj;
*(void (**)(HSD_GObj*, int))((u8*)arg0 + 0x1C) = fn_802130D0;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arg0->render_cb

lb_80011C18(temp_r30, 0x1000U);
grPura_80213250(temp_r30);
HSD_MObjSetToonTextureImage(NULL);
grAnime_801C8138((HSD_GObj* ) arg0, temp_r31->map_id, 0);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no cast required between HSD_GObj and Ground_GObj, they're the same type

it_8027CAD8(gobj);
it_8027C0A8(gobj, 0.0f, 5.0f);
it_802756E0(gobj);
it_802D848C(gobj, 0, 2);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 = ITEM_ANIM_UPDATE

void it_802D747C(Item_GObj* gobj) {}
void it_802D747C(Item_GObj* gobj)
{
u8 _[8];
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PAD_STACK(8); preferred. Makes me think this might be an inline, because the same code (and padding) is present in itOldkuri_UnkMotion9_Anim.

@zalo
Copy link
Copy Markdown
Author

zalo commented Apr 1, 2026

@BR- Thank you for the tips — these are really helpful!

  1. --no-casts — Great call, I'll switch our m2c invocations to use this so we get M2C_FIELD instead of raw pointer casts. That's a much better starting point for the AI to work from.

  2. --union-field GroundVars:icemt — Noted, will add this for ground stage files. Are there other common union fields we should be aware of for other modules?

  3. Variable naming — Agreed, gp/jobj/fp are much more readable than temp_rN. I'll update the prompts to prefer semantic names matching the conventions in surrounding code.

  4. GroundVars — Yeah, we noticed those were tricky. The --union-field flag should help a lot there.

Will incorporate all of this into the pipeline. Thanks again!

@BR-
Copy link
Copy Markdown
Collaborator

BR- commented Apr 2, 2026

  1. Ah, I was mistaken. It's --valid-syntax that generates M2C_FIELD. I use both.
void* arg1;

if ((s16) arg1->unk0 != 0) // default
if (arg1->unk0 != 0) // --no-casts
if ((s16) M2C_FIELD(arg1, s16*, 0) != 0) // --valid-syntax
if (M2C_FIELD(arg1, s16*, 0) != 0) // --valid-syntax --no-casts
  1. FighterVars, ItemVars are the other major ones.
  2. Placeholder names are preferred if it's not obvious what the var is, but there's a bunch of easy names (gp, jobj, i ,...) that help imo
  3. There's specifically something going on with offsets near xF8 where the types don't align. grIceMt_801F87FC treats it as a HSD_GObj*, fn_801F9338 treats it as a s16. Given that grIceMt_GroundVars2 stops at xF4, I think most likely there's some s16 data at the end of icemt2 that we haven't filled in yet.

@zalo
Copy link
Copy Markdown
Author

zalo commented Apr 2, 2026

@BR- Thanks for the correction and the additional detail!

  1. --valid-syntax --no-casts — Got it, updated the pipeline to use both together. Makes sense now: --valid-syntax is what actually generates M2C_FIELD(), --no-casts cleans up the surrounding casts.

  2. FighterVars, ItemVars — Added automatic --union-field detection for all three major unions based on source path (/gr/ → GroundVars, /ft/ → FighterVars, /it/ → ItemVars).

  3. Variable naming — Understood, updated the prompts to use semantic names when clear (gp, jobj, i) but stick with placeholders when the purpose isn't obvious.

  4. xF8 struct gap — Really helpful context. Knowing that grIceMt_GroundVars2 stops at xF4 with unmapped data after it explains the type ambiguity we were hitting. We'll keep this in mind when working on gricemt functions in the future.

All incorporated into the pipeline and docs. Looking forward to putting these improvements to work on the next batch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants