feat: add built-in help system with fuzzy search for INPUT parameters#6935
feat: add built-in help system with fuzzy search for INPUT parameters#6935zhubonan wants to merge 12 commits intodeepmodeling:developfrom
Conversation
- Add -h/--help flag for parameter documentation - Add -s/--search flag for keyword search - Implement fuzzy matching for typo suggestions (Levenshtein distance) - Case-insensitive parameter lookup - Auto-generate help data from markdown at build time - 458 parameters documented with comprehensive metadata Usage: abacus -h <param>, abacus -s <keyword> refactor(help): improve help system with critical fixes and performance optimizations docs(help): integrate help system documentation into quick start guide Add concise help system section to docs/quick_start/input.md covering: - Basic commands (-h, -s flags) - Example output for parameter lookup - Fuzzy matching for typo suggestions - Case-insensitive lookup behavior Remove standalone documentation files that are now redundant: - docs/HELP_SYSTEM.md - docs/FUZZY_SEARCH.md
- Add input_help.cpp to MODULE_IO_parse_args test sources to fix linker error - Fix input_help_test.cpp include path and use value syntax instead of pointer - Fix parse_args_test.cpp death tests to use EXPECT_EXIT regex matching instead of CaptureStderr which doesn't work properly with fork()
This commit addresses all code review comments and fixes a critical UX issue in the help system's fuzzy matching algorithm. ## Documentation Fixes - Fix incorrect "case-sensitive" documentation (should be "case-insensitive") - Update API documentation for find_similar_parameters() with new algorithm details - Add inline comments explaining 3-tier matching strategy ## Architecture Improvements - Add std::ostream& parameter to show_general_help() and show_parameter_help() - Enable flexible stream redirection (stdout for success, stderr for errors) - Maintain backward compatibility with default parameters - Eliminate code duplication by reusing show_general_help() in error handler ## UX Fix: Fuzzy Matching Algorithm ### Problem Pure Levenshtein distance suggested semantically irrelevant parameters: - "abacus -h relax" incorrectly suggested "dmax" and "nelec" ### Solution Implement 3-tier semantic matching strategy: 1. Prefix matches (e.g., "relax" → "relax_new") - Priority 0 2. Substring matches (e.g., "cut" → "ecutwfc") - Priority 1 3. Levenshtein distance for typos - Priority 10+
ZhouXY-PKU
left a comment
There was a problem hiding this comment.
Nice job! Thanks for your contribution!
|
I think there are still some room for improvement in terms of the formatting for the help information printed. Please don't merge this yet |
…lp system Migrate INPUT parameter documentation from a manually-maintained markdown file to auto-generation from C++ source. Documentation fields (category, type, description, default_value, unit, availability) are now embedded directly in Input_Item registrations in read_input_item_*.cpp files, serving as the single source of truth for both the built-in --help system and the Sphinx documentation. Key changes: - Add docs/generate_docs_from_source.py to parse C++ source and generate input-main.md (492 parameters across 34 categories) - Hook generator into Sphinx build via builder-inited event in conf.py - Add [NOTE] documentation to 36 parameters preserving notes from the previous manually-maintained docs (version compat, usage caveats) - Fix generator regex to handle comments before Input_Item declarations - Rename tmp_item to item in smearing_sigma_temp block for generator compatibility - Remove old tools/generate_help_data.py and input_help_data.h pipeline
…/abacus-develop into abacus-develop-built-in-help # Conflicts: # docs/advanced/input_files/input-main.md
|
I am refactoring the code in this PR with different approach. The input-main.md is now generated from the .cpp files which contains the parameters definitions and descriptions. This way the list of parameters and descript only needs to be maintained at the single phase. |
The old generate_docs_from_source.py parsed C++ source files with regex to extract Input_Item metadata, which was fragile and broke on code refactoring. Replace it with a pipeline that uses the binary's own parameter registry: abacus --generate-parameters-yaml > docs/parameters.yaml python docs/generate_input_main.py docs/parameters.yaml - Add ParameterHelp::generate_yaml() with YAML serialization helpers - Add --generate-parameters-yaml CLI flag to parse_args.cpp - Add POST_BUILD command in CMakeLists.txt to auto-regenerate docs/parameters.yaml on every build - Create docs/generate_input_main.py (YAML-to-markdown converter) - Update docs/conf.py Sphinx hook to use YAML pipeline - Track docs/parameters.yaml in the repository - Delete docs/generate_docs_from_source.py - Add 5 unit tests for YAML generation
…contributing guide - Quote numeric-looking default values in yaml_quote_if_needed() so PyYAML parses them as strings (fixes 92 parameters losing defaults) - Also quote .inf, -.inf, .nan YAML special values - Fix Python generate_input_main.py to use != '' instead of truthiness checks, preventing falsy-but-valid values (e.g. 0) from being dropped - Replace sys.exit(1) with FileNotFoundError in generate() for safe use as a library from conf.py - Remove POST_BUILD command from CMakeLists.txt (not portable across CMake generators, breaks cross-compilation) - Update CONTRIBUTING.md "Documenting INPUT Parameters" section to describe the YAML-based workflow and remind developers to regenerate docs/parameters.yaml when adding or modifying parameters - Regenerate docs/parameters.yaml with all values properly quoted
… legacy doc order The built-in help system generates input-main.md from C++ source, with parameter order determined by add_item() call order. After introducing the YAML-based doc pipeline, 19 of 32 documentation sections had different parameter ordering compared to the old hand-maintained input-main.md. Reorder item blocks within each of the 13 read_input_item_*.cpp files to restore the original documentation order. This is a pure mechanical reordering with no logic changes. 28 of 32 sections now match; the 4 remaining differences are due to cross-file constraints (constructor call order) that cannot be resolved without moving items between files. Add a comment to each item_*() function noting that add_item() call order determines generated documentation order.
|
Changes since the initial approach: Documentation pipeline now is : C++ source → abacus --generate-parameters-yaml → parameters.yaml → generate_input_main.py → input-main.md This means the documentation of each input parameter is only written once (in the .cpp files)
|
I have added a built-in help system for INPUT parameters.
Usage
Implementation
source/source_io/input_help.{h,cpp}- Help system class with search and fuzzy matchingsource/source_io/input_help_data.h- Auto-generated parameter data (4295 lines, 458 parameters)source/source_io/parse_args.cpp- Command-line argument handlingtools/generate_help_data.py- Build-time code generator from markdowndocs/quick_start/input.md