fix: force UTF-8 for Windows shell subprocesses to fix CJK mojibake#1418
Conversation
On non-UTF-8 Windows locales (e.g. zh-CN ACP 936/GBK), shell subprocesses emit output in the legacy code page which we decode as UTF-8, producing mojibake for CJK filenames/output. Fix both execution paths: - bash tool (src/tool/bash.ts): prepend an $OutputEncoding/[Console]:: OutputEncoding UTF-8 statement for PowerShell, chcp 65001 for cmd. - TUI shell mode (src/session/prompt.ts shellImpl): same prefixes for powershell/pwsh/cmd invocations. - Add PYTHONIOENCODING=utf-8 to the shell env on Windows so piped Python output is UTF-8 (Python ignores the code page when stdout is a pipe). Based on upstream anomalyco/opencode#31658.
f1237a3 to
c8a3598
Compare
Document the system-wide UTF-8 (Beta) toggle as a workaround for garbled CJK shell output on non-UTF-8 Windows locales, for users on older versions or tools not yet special-cased. Added to both English and Chinese README.
|
Reviewed with a focus on the command-concatenation safety. TL;DR: no new injection surface, and the most dangerous concat pitfall (exit-code clobbering) is handled correctly. One minor robustness nit worth fixing. No new injection risk. Exit-code semantics are correct (the easy thing to get wrong here):
Minor: Notes (not blocking):
Overall LGTM on the concat form; only the |
…stderr Address PR review: extract POWERSHELL_UTF8_PREFIX and CMD_UTF8_PREFIX into src/shell/shell.ts so the bash tool and TUI shell-mode paths can't drift, and redirect both stdout and stderr (>nul 2>nul) so a chcp failure in a restricted shell never pollutes command output.
|
Thanks for the review! Both addressed in b2dd9fc:
The global
|
…iaomiMiMo#1418) * fix: force UTF-8 for Windows shell subprocesses to fix CJK mojibake On non-UTF-8 Windows locales (e.g. zh-CN ACP 936/GBK), shell subprocesses emit output in the legacy code page which we decode as UTF-8, producing mojibake for CJK filenames/output. Fix both execution paths: - bash tool (src/tool/bash.ts): prepend an $OutputEncoding/[Console]:: OutputEncoding UTF-8 statement for PowerShell, chcp 65001 for cmd. - TUI shell mode (src/session/prompt.ts shellImpl): same prefixes for powershell/pwsh/cmd invocations. - Add PYTHONIOENCODING=utf-8 to the shell env on Windows so piped Python output is UTF-8 (Python ignores the code page when stdout is a pipe). Based on upstream anomalyco/opencode#31658. * docs: add Windows CJK mojibake workaround to README Document the system-wide UTF-8 (Beta) toggle as a workaround for garbled CJK shell output on non-UTF-8 Windows locales, for users on older versions or tools not yet special-cased. Added to both English and Chinese README. * docs: clarify MiMoCode version and tone in Windows CJK note * docs: refine wording of Windows CJK note * refactor: extract shared Windows UTF-8 shell prefixes; redirect chcp stderr Address PR review: extract POWERSHELL_UTF8_PREFIX and CMD_UTF8_PREFIX into src/shell/shell.ts so the bash tool and TUI shell-mode paths can't drift, and redirect both stdout and stderr (>nul 2>nul) so a chcp failure in a restricted shell never pollutes command output.
Summary
Fixes CJK mojibake (garbled Chinese/Japanese/Korean text) in shell command output on Windows with non-UTF-8 locales (e.g. zh-CN, where the active code page is 936/GBK). Spawned PowerShell/cmd subprocesses emit output in the legacy code page, which we decode as UTF-8 → garbled characters.
Background
Based on upstream anomalyco/opencode#31658 ("set default UTF-8 encoding for spawned subprocess on Windows"), which prepends an encoding statement to the command and adds
PYTHONIOENCODING.We investigated two earlier upstream attempts first and rejected them:
SetConsoleCP(CP_UTF8)via FFI) — targets TUI keyboard input, not command output; verified ineffective for our mojibake case on the test machine.[Console]::OutputEncodingonly) — partial; doesn't cover cmd or piped Python output.What we changed (beyond upstream #31658)
Upstream only patched the single subprocess spawn path. Our codebase has two independent shell execution paths, and we fixed both:
src/tool/bash.ts) — used by the model/LLM tool calls.src/session/prompt.tsshellImpl) — the interactive path when the user presses `tab` to switch into shell mode and types commands directly. This path is completely separate from the bash tool and was missed by a naive single-location port; it has its own `invocations` table for picking shell args.In both paths, on Windows:
Recommended workaround before this lands
Until this fix is merged and released, the most reliable workaround is to enable Windows' system-wide UTF-8 support, which sets the active code page (ACP) to 65001 for all programs:
Settings → Time & language → Language & region → Administrative language settings (Change system locale) → check "Beta: Use Unicode UTF-8 for worldwide language support" → reboot.
This makes the ACP UTF-8 globally, so subprocesses no longer inherit GBK and the mojibake disappears without any build change. Note it is a system-wide Beta toggle and may affect other legacy (non-Unicode) apps, so treat it as a workaround rather than a permanent requirement.
This PR also adds a "Windows: garbled CJK output" note documenting this workaround to both the English and Chinese README.
Verification
Built a `windows-x64` binary and tested on Windows 11 + Windows Terminal (zh-CN, code page 936):
Before
After