Skip to content

feat(tika): support RAR5 archives via 7-Zip-JBinding#1176

Open
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:feat/RR-1163-tika-rar5-support
Open

feat(tika): support RAR5 archives via 7-Zip-JBinding#1176
ksaurabhAparavi wants to merge 1 commit into
rocketride-org:developfrom
ksaurabhAparavi:feat/RR-1163-tika-rar5-support

Conversation

@ksaurabhAparavi

@ksaurabhAparavi ksaurabhAparavi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Tika's junrar-backed RarParser throws on RAR v5 (the default WinRAR format since 2013), so modern RAR files fail to parse.
  • Add net.sf.sevenzipjbinding plus a RarSevenZipParser that auto-detects RAR4/RAR5 and routes entries through the configured EmbeddedDocumentExtractor; ConfigBuilder swaps the parser and TikaApi initializes the native binding once per JVM.

Testing

  • CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1163

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR replaces Tika's JUnRAR-based RAR parser with a new 7-Zip-JBinding implementation that supports both RAR4 and RAR5 archives. Changes add a RarSevenZipParser class, Maven dependencies, JVM-level library initialization, and configuration wiring to swap the parser in Tika's configuration.

Changes

RAR5 Archive Support via 7-Zip-JBinding

Layer / File(s) Summary
RAR5 parser implementation
packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/parsers/rar/RarSevenZipParser.java
RarSevenZipParser implements Parser to detect RAR4/RAR5 archives, enumerate and extract entries, handle encryption, build per-entry metadata, and delegate to EmbeddedDocumentExtractor for content extraction. Includes handleEntryMetadata helper and archive-level exception wrapping.
7-Zip-JBinding dependencies
packages/tika/lib/tika/pom-template.xml
Maven POM adds sevenzipjbinding and sevenzipjbinding-all-platforms (v16.02-2.01) with explanatory comment on RAR4+RAR5 support and native library loading.
7-Zip runtime initialization
packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
TikaApi imports SevenZip and guards JVM-level initialization in init(), logging a warning if the native library fails to load while allowing other parsers to continue functioning.
Parser configuration and registration
packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
ConfigBuilder.getConfig() removes Tika's org.apache.tika.parser.pkg.RarParser and ensures the custom com.rocketride.tika_api.parsers.rar.RarSevenZipParser is registered in the Tika configuration DOM.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Seven zips bound, RAR5 now read,
Where four and five in archives spread.
A parser new, both sleek and true,
Hops past the old, extracts what's due!
wiggles nose at compression

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Linked Issues check ✅ Passed All code changes implement the linked issue #1163 requirements: adding 7-Zip-JBinding dependency, creating RarSevenZipParser to auto-detect RAR4/RAR5, integrating via ConfigBuilder and TikaApi initialization.
Out of Scope Changes check ✅ Passed All changes are directly scoped to RAR5 support implementation: dependencies, parser implementation, configuration replacement, and 7-Zip initialization—no unrelated modifications detected.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding RAR5 support via 7-Zip-JBinding, which is the core objective of the PR and reflected in all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java`:
- Around line 435-439: ConfigBuilder.getConfig() currently unconditionally
removes org.apache.tika.parser.pkg.RarParser and adds
com.rocketride.tika_api.parsers.rar.RarSevenZipParser even when 7-Zip failed to
initialize; add a readiness getter on TikaApi (e.g., TikaApi.isSevenZipReady()
set from SevenZip.isInitializedSuccessfully() in TikaApi.init() after calling
SevenZip.initSevenZipFromPlatformJAR()) and then change
ConfigBuilder.getConfig() to only call removeParser(doc,
"org.apache.tika.parser.pkg.RarParser") and findOrAddParser(doc,
"com.rocketride.tika_api.parsers.rar.RarSevenZipParser") when that readiness
getter returns true so the original RarParser remains as a fallback when 7-Zip
isn’t initialized.

In
`@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/parsers/rar/RarSevenZipParser.java`:
- Around line 156-172: The RarSevenZipParser currently writes extracted data to
a temp file without any size limit; modify the extraction block around
tmp.createTemporaryFile()/new FileOutputStream(entryFile) and the
item.extractSlow(...) ISequentialOutStream.write implementation to enforce a
MAX_ENTRY_UNPACKED_BYTES guard (define a sensible constant), track cumulative
bytesWritten as chunks are written, and if the limit would be exceeded: stop
extraction by throwing a SevenZipException (or otherwise aborting), close and
delete entryFile, log a warning including name and bytesWritten, and return
early so the oversized entry is not persisted; ensure proper resource cleanup in
the try-with-resources and that the existing ExtractOperationResult check still
runs for normal failures.

In `@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java`:
- Around line 321-327: The TikaApi.init() block currently catches Throwable when
initializing SevenZip which hides JVM-fatal Errors; change the catch to only
handle non-fatal exceptions by catching Exception (or specific exceptions) from
SevenZip.initSevenZipFromPlatformJAR(), and do not swallow Errors—either let
Errors propagate or rethrow them (e.g., catch Exception e and call
logger.log(..., e)); update the catch around
SevenZip.initSevenZipFromPlatformJAR() / SevenZip.isInitializedSuccessfully() to
reference SevenZip and logger so only recoverable failures are logged while JVM
Errors are not suppressed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 535ef594-b1c1-4228-999e-6650b9f223e1

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and c3665c8.

📒 Files selected for processing (4)
  • packages/tika/lib/tika/pom-template.xml
  • packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
  • packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
  • packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/parsers/rar/RarSevenZipParser.java

Comment on lines +435 to +439
// Replace Tika's junrar-backed RarParser (RAR4-only; throws on RAR5) with our
// 7-Zip-JBinding-backed parser, which handles both RAR4 and RAR5.
removeParser(doc, "org.apache.tika.parser.pkg.RarParser");
findOrAddParser(doc, "com.rocketride.tika_api.parsers.rar.RarSevenZipParser");

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n "class ConfigBuilder|removeParser\\(doc, \"org\\.apache\\.tika\\.parser\\.pkg\\.RarParser\"|RarSevenZipParser|isSevenZipReady|SevenZip|7-zip" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
rg -n "TikaApi\\.isSevenZipReady|isSevenZipReady|SevenZip|native|init failure|RarSevenZipParser|RarParser" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java || true
ls -R packages/tika/lib/tika/src/main/java/com/rocketride/tika_api | head

Repository: rocketride-org/rocketride-server

Length of output: 6664


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '280,360p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
sed -n '400,470p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
fd -a "TikaApi.java" packages/tika/lib/tika/src/main/java -t f || true
rg -n "TikaApi" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api || true

Repository: rocketride-org/rocketride-server

Length of output: 9467


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find init / sevenzip init code region
sed -n '250,380p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java

# 2) Find ConfigBuilder.getConfig() call sites
rg -n "ConfigBuilder\\.getConfig\\(|new ConfigBuilder\\(|getConfig\\(\\)" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java packages/tika/lib/tika/src/main/java/com/rocketride/tika_api -S || true

# 3) Inspect relevant part of TikaApi where it wires Tika configuration / extraction startup
rg -n "getConfig\\(|TikaConfig\\b|tika-config\\.xml|ConfigBuilder" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
sed -n '580,760p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java

# 4) Inspect RarSevenZipParser to see how it depends on 7-zip binding init
sed -n '1,200p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/parsers/rar/RarSevenZipParser.java

Repository: rocketride-org/rocketride-server

Length of output: 19086


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "isSevenZipReady|sevenZipReady|SevenZip.*ready|initSevenZip|isInitializedSuccessfully\\(|initializedSuccessfully" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
rg -n "RarSevenZipParser|RarParser" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api -S
rg -n "catch \\(.*TikaException|catch \\(.*Exception\\)" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
rg -n "parse\\(|tikaConfig|tika.*config|new TikaConfig|tika.*parse" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java

Repository: rocketride-org/rocketride-server

Length of output: 1321


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '80,220p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/EmbeddedContentExtractor.java
rg -n "ConfigBuilder\\.getConfig\\(" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api -S
sed -n '300,380p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/EmbeddedContentExtractor.java 2>/dev/null || true

Repository: rocketride-org/rocketride-server

Length of output: 5973


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n "SevenZip\\b" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
sed -n '1,80p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java

Repository: rocketride-org/rocketride-server

Length of output: 58


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# locate helper methods + tika-config.xml usage
rg -n "removeParser\\(|findOrAddParser\\(|tika-config\\.xml" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
# inspect helper implementations
sed -n '360,520p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
# find shipped config file in repo (if present) or template
fd "tika-config.xml" . -t f || true
rg -n "RarParser|RarSevenZipParser|rar" -S . --glob "tika-config.xml" || true

Repository: rocketride-org/rocketride-server

Length of output: 4601


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Inspect helper methods in ConfigBuilder
sed -n '120,280p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java

# Inspect shipped tika-config.xml around RarParser / rar parser entries
rg -n "RarParser|RarSevenZipParser|rar" packages/tika/lib/tika/tika-config.xml
# show a narrower window around the matches
rg -n -o --context 5 "RarParser|RarSevenZipParser" packages/tika/lib/tika/tika-config.xml | head -n 200

Repository: rocketride-org/rocketride-server

Length of output: 5076


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Find defaultParserName and surrounding constants in ConfigBuilder
rg -n "defaultParserName|defaultParser\\b" packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java
sed -n '1,120p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java

# Show tika-config.xml sections near RarParser / RarSevenZipParser / default parser excludes
rg -n "RarParser|RarSevenZipParser|parser-exclude|AutoDetectParser|defaultParser" packages/tika/lib/tika/tika-config.xml
# Print a limited window around any RarParser mentions
rg -n "RarParser|RarSevenZipParser" packages/tika/lib/tika/tika-config.xml | head -n 50
# If large, show top 250 lines to locate defaults, then windows around matches
head -n 260 packages/tika/lib/tika/tika-config.xml
# Narrow windows around the matches (use line numbers from rg)
matches=$(rg -n "RarParser|RarSevenZipParser" packages/tika/lib/tika/tika-config.xml | cut -d: -f1 | head -n 20)
if [ -n "${matches:-}" ]; then
  for ln in $matches; do
    start=$((ln-40)); end=$((ln+40));
    echo "---- tika-config.xml:$ln ----";
    sed -n "${start},${end}p" packages/tika/lib/tika/tika-config.xml;
  done
fi

Repository: rocketride-org/rocketride-server

Length of output: 4253


🏁 Script executed:

#!/bin/bash
set -euo pipefail

XML="packages/tika/lib/tika/tika-config.xml"

# Search for any RAR-related parser names/classes or type hints
rg -n -i "rar" "$XML" || true

# Also search for "parser-exclude" blocks near the top to see what default parser excludes exist
# Print the first ~220 lines to include the <parsers> section and any excludes
head -n 260 "$XML"

# If there are rar-related matches, print a bit more around them
matches=$(rg -n -i "rar" "$XML" | cut -d: -f1 | head -n 20 || true)
if [ -n "${matches:-}" ]; then
  for ln in $matches; do
    start=$((ln-60)); end=$((ln+60));
    echo "---- $XML:$ln ----"
    sed -n "${start},${end}p" "$XML"
  done
fi

Repository: rocketride-org/rocketride-server

Length of output: 2216


Gate RAR parser replacement on successful 7-Zip-JBinding initialization

TikaApi.init() logs and continues when SevenZip.initSevenZipFromPlatformJAR() fails, but ConfigBuilder.getConfig() always removes org.apache.tika.parser.pkg.RarParser and installs com.rocketride.tika_api.parsers.rar.RarSevenZipParser (RAR parsing will then have no working fallback if the 7-Zip native init didn’t succeed). ConfigBuilder.java:437-438, TikaApi.init().

Suggested direction
-        removeParser(doc, "org.apache.tika.parser.pkg.RarParser");
-        findOrAddParser(doc, "com.rocketride.tika_api.parsers.rar.RarSevenZipParser");
+        if (TikaApi.isSevenZipReady()) {
+            removeParser(doc, "org.apache.tika.parser.pkg.RarParser");
+            findOrAddParser(doc, "com.rocketride.tika_api.parsers.rar.RarSevenZipParser");
+        } else {
+            // keep upstream parser as fallback when native 7-Zip init is unavailable
+            removeParser(doc, "com.rocketride.tika_api.parsers.rar.RarSevenZipParser");
+            findOrAddParser(doc, "org.apache.tika.parser.pkg.RarParser");
+        }

(Requires adding a small readiness flag/getter in TikaApi based on SevenZip.isInitializedSuccessfully() after init attempt.)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java`
around lines 435 - 439, ConfigBuilder.getConfig() currently unconditionally
removes org.apache.tika.parser.pkg.RarParser and adds
com.rocketride.tika_api.parsers.rar.RarSevenZipParser even when 7-Zip failed to
initialize; add a readiness getter on TikaApi (e.g., TikaApi.isSevenZipReady()
set from SevenZip.isInitializedSuccessfully() in TikaApi.init() after calling
SevenZip.initSevenZipFromPlatformJAR()) and then change
ConfigBuilder.getConfig() to only call removeParser(doc,
"org.apache.tika.parser.pkg.RarParser") and findOrAddParser(doc,
"com.rocketride.tika_api.parsers.rar.RarSevenZipParser") when that readiness
getter returns true so the original RarParser remains as a fallback when 7-Zip
isn’t initialized.

Comment on lines +156 to +172
File entryFile = tmp.createTemporaryFile();
try (OutputStream fos = new FileOutputStream(entryFile)) {
ExtractOperationResult result = item.extractSlow(new ISequentialOutStream() {
@Override
public int write(byte[] data) throws SevenZipException {
try {
fos.write(data);
} catch (IOException e) {
throw new SevenZipException(e);
}
return data.length;
}
});
if (result != ExtractOperationResult.OK) {
logger.log(Level.WARNING, "RAR entry extraction returned " + result + " for " + name);
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add a decompressed-size guard before writing archive entries to disk.

Line 156 currently writes each entry fully to temp storage without a maximum bound. A crafted RAR can exhaust disk space and stall parsing workers.

Suggested fix
 public class RarSevenZipParser implements Parser {
+    private static final long MAX_ENTRY_BYTES = 512L * 1024 * 1024; // make configurable if possible
@@
-        File entryFile = tmp.createTemporaryFile();
+        if (size != null && size > MAX_ENTRY_BYTES) {
+            logger.log(Level.WARNING, "Skipping oversized RAR entry: " + name + " (" + size + " bytes)");
+            return;
+        }
+        File entryFile = tmp.createTemporaryFile();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/parsers/rar/RarSevenZipParser.java`
around lines 156 - 172, The RarSevenZipParser currently writes extracted data to
a temp file without any size limit; modify the extraction block around
tmp.createTemporaryFile()/new FileOutputStream(entryFile) and the
item.extractSlow(...) ISequentialOutStream.write implementation to enforce a
MAX_ENTRY_UNPACKED_BYTES guard (define a sensible constant), track cumulative
bytesWritten as chunks are written, and if the limit would be exceeded: stop
extraction by throwing a SevenZipException (or otherwise aborting), close and
delete entryFile, log a warning including name and bytesWritten, and return
early so the oversized entry is not persisted; ensure proper resource cleanup in
the try-with-resources and that the existing ExtractOperationResult check still
runs for normal failures.

Comment on lines +321 to +327
try {
if (!SevenZip.isInitializedSuccessfully()) {
SevenZip.initSevenZipFromPlatformJAR();
logger.log(Level.INFO, "7-Zip-JBinding initialized: " + SevenZip.getSevenZipVersion().version);
}
} catch (Throwable t) {
logger.log(Level.WARNING, "Failed to initialize 7-Zip-JBinding; RAR parsing will fail", t);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify broad Throwable catches on startup paths and related parser wiring.
rg -n 'catch\s*\(\s*Throwable\s+' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
rg -n 'RarParser|RarSevenZipParser|initSevenZipFromPlatformJAR|isInitializedSuccessfully' \
  packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java \
  packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/ConfigBuilder.java

Repository: rocketride-org/rocketride-server

Length of output: 987


🏁 Script executed:

#!/bin/bash
# Inspect surrounding code for the Throwable catch to assess whether it's truly initialization-only and what exceptions SevenZip calls may throw.
sed -n '280,360p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
# Check if there are any other Throwable catches in this file/package.
rg -n 'catch\s*\(\s*Throwable\s+' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api

Repository: rocketride-org/rocketride-server

Length of output: 2786


🏁 Script executed:

#!/bin/bash
# Inspect surrounding code for the Throwable catch to assess whether it's truly initialization-only and what exceptions SevenZip calls may throw.
sed -n '280,360p' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java
# Check if there are any other Throwable catches in this file/package.
rg -n 'catch\s*\(\s*Throwable\s+' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api

Repository: rocketride-org/rocketride-server

Length of output: 2786


🏁 Script executed:

#!/bin/bash
# Find any other Throwable catches in the same area.
rg -n 'catch\s*\(\s*Throwable\s+\w+\s*\)' packages/tika/lib/tika/src/main/java/com/rocketride/tika_api

# Locate the SevenZip implementation and inspect init method.
fd -a "SevenZip.java" packages/tika/lib/tika/src/main/java | head
rg -n "class SevenZip|initSevenZipFromPlatformJAR|isInitializedSuccessfully" packages/tika/lib/tika/src/main/java -S

Repository: rocketride-org/rocketride-server

Length of output: 428


🏁 Script executed:

#!/bin/bash
# Locate SevenZip implementation and inspect init method.
rg -n "initSevenZipFromPlatformJAR" packages/tika/lib/tika/src/main/java
rg -n "class SevenZip|isInitializedSuccessfully" packages/tika/lib/tika/src/main/java

Repository: rocketride-org/rocketride-server

Length of output: 324


🏁 Script executed:

#!/bin/bash
# Find where SevenZip is defined in the repository.
fd -a -t f "SevenZip.java" . || true
rg -n "class\s+SevenZip\b" . -S || true
rg -n "isInitializedSuccessfully" . -S || true
rg -n "initSevenZipFromPlatformJAR" . -S || true

Repository: rocketride-org/rocketride-server

Length of output: 328


Avoid catching Throwable during 7-Zip-JBinding initialization in TikaApi.init() (TikaApi.java:326)

Catching Throwable suppresses JVM-fatal Errors (e.g., OutOfMemoryError) and still allows initialization to proceed, leaving the subsystem potentially in an unsafe state.

Suggested fix
-        } catch (Throwable t) {
+        } catch (Exception | UnsatisfiedLinkError t) {
             logger.log(Level.WARNING, "Failed to initialize 7-Zip-JBinding; RAR parsing will fail", t);
         }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/tika/lib/tika/src/main/java/com/rocketride/tika_api/TikaApi.java`
around lines 321 - 327, The TikaApi.init() block currently catches Throwable
when initializing SevenZip which hides JVM-fatal Errors; change the catch to
only handle non-fatal exceptions by catching Exception (or specific exceptions)
from SevenZip.initSevenZipFromPlatformJAR(), and do not swallow Errors—either
let Errors propagate or rethrow them (e.g., catch Exception e and call
logger.log(..., e)); update the catch around
SevenZip.initSevenZipFromPlatformJAR() / SevenZip.isInitializedSuccessfully() to
reference SevenZip and logger so only recoverable failures are logged while JVM
Errors are not suppressed.

@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

@github-actions github-actions Bot added the module:server C++ engine and server components label Jun 8, 2026
Tika's junrar-backed RarParser throws on RAR v5 (the default WinRAR format
since 2013). Add net.sf.sevenzipjbinding plus a RarSevenZipParser that
auto-detects RAR4/RAR5 and routes entries through the configured
EmbeddedDocumentExtractor; ConfigBuilder swaps the parser and TikaApi
initializes the native binding once per JVM.

Fixes rocketride-org#1163
@ksaurabhAparavi ksaurabhAparavi force-pushed the feat/RR-1163-tika-rar5-support branch from c3665c8 to 4caed72 Compare June 8, 2026 11:51
@ksaurabhAparavi ksaurabhAparavi changed the title ADS-529: replace junrar RarParser with 7-Zip-JBinding for RAR5 support (#30) feat(tika): support RAR5 archives via 7-Zip-JBinding Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:server C++ engine and server components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support RAR5 archives (junrar RarParser throws on RAR v5)

1 participant