Replace utf8 with modified utf8. by xymb-endcrystalme · Pull Request #197 · vberlier/nbtlib

xymb-endcrystalme · 2025-06-25T01:12:59Z

nbtlib is sadly broken. Minecraft uses modified UTF-8, while nbtlib uses normal UTF-8. Reading an NBT file with UTF-8 characters and saving it again screws up texts.

I admit, I did just ask o3 to write this code. However, a test with 70 region files started passing, so I suspect it's at least somewhat correct.

My test was "open a region file, open&save each chunk as nbtlib, verify contents (chunk bytes) are the same". nbtlib was making errors in some chunks that contained books with funny characters, after this patch it stopped. So I'll start internally using this for now.

Happy2018new · 2025-06-25T04:42:19Z

Maybe this is a problem. However, this project may also worked for some Bedrock projects, but different to Minecraft Java Edition, they seems uses standard UTF-8 encoding.

So, here comes a problem is that, how would those projects who based on Minecraft bedrock work as well? (Because this PR looks like destroy the support of Bedrock)

Happy2018new · 2025-06-25T04:58:06Z

I modified a item name to 𡧛 (the ord of it is 137691) by using the anvil in Minecraft Bedrock Edition, and use Structure Block export it as .mcstructure file, and it proves that Minecraft Bedrock Edition is using standard UTF-8.

test_mcstructure.zip

b'\x08\x04\x00\x4e\x61\x6d\x65\x04\x00\xf0\xa1\xa7\x9b'


\x08	The ID of TAG_String (8)
\x04\x00	The length (4) of the key name of TAG_String (key is 'Name') who encoding in little endian.
\x4e\x61\x6d\x65	The key name of TAG_String, which is `b'\x4e\x61\x6d\x65'.decode() = 'Name'`.
\x04\x00	The length (4) of the value of this key ('Name') who encoding in little endian.
\xf0\xa1\xa7\x9b	The value of this key ('Name'), which is `b'\xf0\xa1\xa7\x9b'.decode() = '𡧛'`.

However, in your code, the encode result is not \xf0\xa1\xa7\x9b but _modified_utf8_encode('𡧛') = b'\xed\xa1\x86\xed\xb7\x9b'.

xymb-endcrystalme · 2025-06-25T13:13:46Z

I wasn't aware that NBT isn't a "standard" and that Bedrock used normal UTF-8. 🤣

Yea, my patch would 100% break that. To do it properly nbtlib will probably need some kind of a switch that tells it if it's a bedrock, or a Java NBT.

Replace utf8 with modified utf8.

bcc9533

xymb-endcrystalme mentioned this pull request Jun 25, 2025

Use modified utf-8 for encoding and decoding strings #192

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replace utf8 with modified utf8.#197

Replace utf8 with modified utf8.#197
xymb-endcrystalme wants to merge 1 commit intovberlier:mainfrom
xymb-endcrystalme:main

xymb-endcrystalme commented Jun 25, 2025

Uh oh!

Happy2018new commented Jun 25, 2025 •

edited

Loading

Uh oh!

Happy2018new commented Jun 25, 2025 •

edited

Loading

Uh oh!

xymb-endcrystalme commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

xymb-endcrystalme commented Jun 25, 2025

Uh oh!

Happy2018new commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Happy2018new commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xymb-endcrystalme commented Jun 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Happy2018new commented Jun 25, 2025 •

edited

Loading

Happy2018new commented Jun 25, 2025 •

edited

Loading