Skip to content

Add a demangler for the OxCaml structured mangling scheme#35

Open
tmcgilchrist wants to merge 8 commits into
ocaml-flambda:oxcaml-llvm-pluginfrom
tmcgilchrist:mangle-21
Open

Add a demangler for the OxCaml structured mangling scheme#35
tmcgilchrist wants to merge 8 commits into
ocaml-flambda:oxcaml-llvm-pluginfrom
tmcgilchrist:mangle-21

Conversation

@tmcgilchrist

@tmcgilchrist tmcgilchrist commented Jun 12, 2026

Copy link
Copy Markdown

The demangler replicates the behaviour of the ocamlfilt demangler in oxcaml#5100 with a minor change to macOS symbol prefixes which lldb trims before they
are passed to demangling. We retain the suffixes for symbols as they ensure uniqueness of symbols.

This is based off LLVM 21, the earlier PR #27 was based on LLVM 16.

Companion OxCaml PR

@spiessimon spiessimon left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a first pass over this. Most of the points below are nits, and all of them should be straightforward to fix.

Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp
Comment thread lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserOxCaml.cpp Outdated
Comment thread lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp Outdated
Comment thread lldb/source/Core/Mangled.cpp
Comment thread lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserOxCaml.cpp Outdated
// name and let the OxCaml demangler produce the source-level name.
// InlineFunctionInfo::GetName() prefers the Mangled object, so build it
// explicitly rather than letting the char-pointer overload demangle the
// linkage name even when DW_AT_name is present.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the previous version not suffice here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the special case here to retain the fully unique name of the symbol. For DW_AT_linkage_name __CamlU4MainF9process_1_3_code will get shown as Main.process_1 with the _3_code suffix stripped. If we remove the special casing on OCaml here, lldb will strip off the extra _1 and give just Main.process.

Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread llvm/lib/Demangle/OxCamlDemangle.cpp Outdated
Comment thread lldb/source/Plugins/SymbolFile/DWARF/ManualDWARFIndex.cpp
// absent. The plain char-pointer overload cannot do this -- it stores
// only the linkage name, so it would always demangle and ignore
// DW_AT_name.
// CR sspies: Check inlined frames have their suffixes stripped.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmcgilchrist, I think this check can be done now. Could you please check this? There should be examples of inlined frames in the OxCaml testsuite.

While you're editing this comment anyway, I would trim it a little. I don't think we need the explanation about the other overload, for example.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this comment. Now for structured a DW_AT_linkage_name __CamlU4MainF9process_1_3_code will get shown as Main.process_1 with the _3_code suffix stripped. For flat it will use the DW_AT_name.

Comment thread lldb/source/Core/Mangled.cpp
The demangler replicates the behaviour of the ocamlfilt demangler in
oxcaml#5100 with a minor change to macOS symbol prefixes which lldb
trims before they are passed to demangling. We retain the suffixes for
symbols as they ensure uniqueness of symbols.

Co-authored-by: Samuel Hym <samuel@tarides.com>
Add __Caml support for macOS symbols. Drop Flat1 scheme as OxCaml
doesn't use $/. separators and make the suffix stripping more precise.
Replace the kParseError sentinel returned by ConsumeUnsignedDecimal and
ConsumeUnsigned26 with std::optional<unsigned>. The sentinel was UINT_MAX,
which collides with a legitimately parsed value and uses the Google-style
"k" naming rather than LLVM style; std::optional removes both problems and
matches the convention used elsewhere in the Demangle library
(MicrosoftDemangle, Demangle.h) and pervasively in lldb. The overflow bound
now uses std::numeric_limits<unsigned>::max(), and the two C-style casts
become static_cast.

Also guard demangleStructured against malloc(0): an empty path such as
"_Caml"/"__Caml" allocated zero bytes, and a nullptr result from malloc(0)
would be treated as OOM and call std::terminate(). Allocate size + 1 so the
request is never zero; OutputBuffer still grows on demand for longer paths.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants