Add a demangler for the OxCaml structured mangling scheme#35
Add a demangler for the OxCaml structured mangling scheme#35tmcgilchrist wants to merge 8 commits into
Conversation
spiessimon
left a comment
There was a problem hiding this comment.
I made a first pass over this. Most of the points below are nits, and all of them should be straightforward to fix.
| // name and let the OxCaml demangler produce the source-level name. | ||
| // InlineFunctionInfo::GetName() prefers the Mangled object, so build it | ||
| // explicitly rather than letting the char-pointer overload demangle the | ||
| // linkage name even when DW_AT_name is present. |
There was a problem hiding this comment.
Why does the previous version not suffice here?
There was a problem hiding this comment.
We need the special case here to retain the fully unique name of the symbol. For DW_AT_linkage_name __CamlU4MainF9process_1_3_code will get shown as Main.process_1 with the _3_code suffix stripped. If we remove the special casing on OCaml here, lldb will strip off the extra _1 and give just Main.process.
| // absent. The plain char-pointer overload cannot do this -- it stores | ||
| // only the linkage name, so it would always demangle and ignore | ||
| // DW_AT_name. | ||
| // CR sspies: Check inlined frames have their suffixes stripped. |
There was a problem hiding this comment.
@tmcgilchrist, I think this check can be done now. Could you please check this? There should be examples of inlined frames in the OxCaml testsuite.
While you're editing this comment anyway, I would trim it a little. I don't think we need the explanation about the other overload, for example.
There was a problem hiding this comment.
I've removed this comment. Now for structured a DW_AT_linkage_name __CamlU4MainF9process_1_3_code will get shown as Main.process_1 with the _3_code suffix stripped. For flat it will use the DW_AT_name.
The demangler replicates the behaviour of the ocamlfilt demangler in oxcaml#5100 with a minor change to macOS symbol prefixes which lldb trims before they are passed to demangling. We retain the suffixes for symbols as they ensure uniqueness of symbols. Co-authored-by: Samuel Hym <samuel@tarides.com>
Partial revert back to llvm#31 to support both mangling schemes in LLDB.
Add __Caml support for macOS symbols. Drop Flat1 scheme as OxCaml doesn't use $/. separators and make the suffix stripping more precise.
Replace the kParseError sentinel returned by ConsumeUnsignedDecimal and ConsumeUnsigned26 with std::optional<unsigned>. The sentinel was UINT_MAX, which collides with a legitimately parsed value and uses the Google-style "k" naming rather than LLVM style; std::optional removes both problems and matches the convention used elsewhere in the Demangle library (MicrosoftDemangle, Demangle.h) and pervasively in lldb. The overflow bound now uses std::numeric_limits<unsigned>::max(), and the two C-style casts become static_cast. Also guard demangleStructured against malloc(0): an empty path such as "_Caml"/"__Caml" allocated zero bytes, and a nullptr result from malloc(0) would be treated as OOM and call std::terminate(). Allocate size + 1 so the request is never zero; OutputBuffer still grows on demand for longer paths.
The demangler replicates the behaviour of the ocamlfilt demangler in oxcaml#5100 with a minor change to macOS symbol prefixes which lldb trims before they
are passed to demangling. We retain the suffixes for symbols as they ensure uniqueness of symbols.
This is based off LLVM 21, the earlier PR #27 was based on LLVM 16.
Companion OxCaml PR