Skip to content

fix(tika): disable external parser startup probes on Windows#1171

Open
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1158-tika-windows-external-parser
Open

fix(tika): disable external parser startup probes on Windows#1171
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1158-tika-windows-external-parser

Conversation

@ksaurabhAparavi

@ksaurabhAparavi ksaurabhAparavi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Clean Windows installs log missing sox executable errors at startup because Tika's CompositeExternalParser probes external parsers.
  • Exclude CompositeExternalParser from the generated config; add a regression test keeping it out of the configured parser tree.

Testing

  • CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1158

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

ConfigBuilder now removes CompositeExternalParser from the loaded Tika configuration alongside the existing TesseractOCRParser removal. A new test verifies this exclusion by asserting that the configured composite parser does not contain CompositeExternalParser in its component parsers.

Changes

CompositeExternalParser exclusion from Tika config

Layer / File(s) Summary
Parser configuration exclusion
packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
ConfigBuilder.getConfig() removes org.apache.tika.parser.external.CompositeExternalParser via an additional removeParser(...) call before constructing the TikaConfig, preventing external-probing parser initialization.
Test validation of parser exclusion
packages/tika/lib/tika/src/test/java/TestTikaConfig.java
New JUnit test testCompositeExternalParserExcluded loads TikaConfig, verifies the configured parser is a CompositeParser, scans component parser class names, and asserts CompositeExternalParser is not present. Test imports CompositeParser and Parser types to support the assertions.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • jmaionchi
  • Rod-Christensen
  • stepmikhaylov

Poem

🐰 External probes no longer sing,
CompositeParser's quiet spring,
Windows startup sleeps so sound,
No sox complaints abound! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed The PR successfully addresses issue #1158 by removing CompositeExternalParser from the Tika config and adding a regression test to verify its exclusion.
Out of Scope Changes check ✅ Passed All changes are in-scope: ConfigBuilder modification excludes the problematic parser, and TestTikaConfig test verifies the fix—both directly support the objective to prevent external parser startup probes.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title focuses on disabling external parser startup probes on Windows, which directly matches the main objective of excluding CompositeExternalParser to prevent startup errors.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the module:server C++ engine and server components label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

Exclude Tika's CompositeExternalParser from the generated config so clean
Windows installs stop logging missing 'sox' executable errors during
parser startup. Adds a regression test keeping the external parser
composite out of the configured parser tree.

Fixes rocketride-org#1158
@ksaurabhAparavi ksaurabhAparavi force-pushed the fix/RR-1158-tika-windows-external-parser branch from f79898b to bc8d191 Compare June 8, 2026 11:51
@ksaurabhAparavi ksaurabhAparavi changed the title fix(tika): ADS-243 disable external parser startup probes on Windows (#15) fix(tika): disable external parser startup probes on Windows Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:server C++ engine and server components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tika CompositeExternalParser logs missing 'sox' errors on Windows startup

2 participants