Skip to content

fix(question-generator): support nested MinerU output in mimic mode#250

Open
2023Anita wants to merge 2 commits intoHKUDS:devfrom
2023Anita:fix/mimic-mineru-markdown
Open

fix(question-generator): support nested MinerU output in mimic mode#250
2023Anita wants to merge 2 commits intoHKUDS:devfrom
2023Anita:fix/mimic-mineru-markdown

Conversation

@2023Anita
Copy link
Copy Markdown

@2023Anita 2023Anita commented Apr 6, 2026

Summary

  • fix mimic exam question extraction when MinerU writes parsed artifacts into nested directories
  • keep the existing preferred directory handling, but fall back to other nested artifact directories before reporting that no markdown was found
  • add a regression test that covers the nested MinerU output layout reported in issue 182

Why

Issue #182 shows that MinerU can emit markdown inside a nested parsed-output directory, while the extractor only checked one preferred directory or the paper root. In that case the parse step succeeds, but question extraction still fails because the markdown is never discovered.

Testing

  • added a regression test for nested parsed-output directories
  • verified the updated extractor and test file compile successfully

Closes #182

@2023Anita
Copy link
Copy Markdown
Author

I checked the failing CI jobs on April 6, 2026 (run 24019400631). The current red checks are caused by the existing Python 3.10 dependency matrix rather than this question-generator fix.

The actual failure is Import Check (Python 3.10), which stops during dependency installation with:

  • Could not find a version that satisfies the requirement oauth-cli-kit>=0.1.1
  • available oauth-cli-kit releases require Python >= 3.11

The Test Summary job is failing only because that import-check job fails.

I also checked a recent upstream dev run from April 2, 2026 (run 23902376382), and it shows the same Python 3.10 import-check failure before this PR. I will open a separate minimal fix for the dependency marker so this PR can be evaluated on its actual code changes.

@2023Anita
Copy link
Copy Markdown
Author

Follow-up: I opened a separate minimal dependency fix here: #251. If that lands first, rerunning the checks on this PR should remove the unrelated Python 3.10 import-check failure.

@2023Anita
Copy link
Copy Markdown
Author

I pushed a follow-up commit to this PR that includes the same Python 3.10 dependency marker fix as #251, so the branch now contains both the question-generator fix and the CI/dependency unblock.

At the moment GitHub shows the new Tests run for this PR (run 24031098632, created on April 6, 2026) as action_required rather than failed, so the updated checks have not executed yet. Once workflow approval is granted and the run is allowed to start, this PR should be evaluated against the latest branch contents instead of the earlier failing run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant