Skip to content

fix(text-output): install smbprotocol at module level and fix anonymize crash#1178

Open
ksaurabhAparavi wants to merge 2 commits into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1165-anonymize-heap-and-smbprotocol
Open

fix(text-output): install smbprotocol at module level and fix anonymize crash#1178
ksaurabhAparavi wants to merge 2 commits into
rocketride-org:developfrom
ksaurabhAparavi:fix/RR-1165-anonymize-heap-and-smbprotocol

Conversation

@ksaurabhAparavi

@ksaurabhAparavi ksaurabhAparavi commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Move depends() to module level in endpoint.py so smbprotocol installs regardless of OPEN_MODE.
  • Fix STATUS_HEAP_CORRUPTION in anonymize by removing the ThreadPoolExecutor from glinerRecognizer.predict() — GLiNER/PyTorch is not thread-safe and concurrent calls on the same model corrupt the native heap.

Testing

  • CI (./builder test) — relying on GitHub Actions; not runnable in the contributor's local shell (engine build / Maven / torch unavailable). Static checks (compile, no conflict markers) pass.

Linked Issue

Fixes #1165

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced connection validation with improved error logging
  • Refactoring

    • Optimized entity recognition processing for improved performance
    • Simplified initialization logic for better maintainability
  • Configuration

    • Extended configuration schema with SMB storage parameters and enhanced anonymization field visibility

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: fb4058b9-b4ca-40a3-bb74-915b30b75460

📥 Commits

Reviewing files that changed from the base of the PR and between b9bd389 and eb17f21.

📒 Files selected for processing (1)
  • nodes/src/nodes/text_output/services.json
💤 Files with no reviewable changes (1)
  • nodes/src/nodes/text_output/services.json

📝 Walkthrough

Walkthrough

This PR removes concurrent GLiNER calls by running predictions sequentially, moves smbprotocol dependency installation to module import in endpoint.py (simplifying IGlobal.beginGlobal to a no-op), updates Endpoint.validateConfig to call self.connect() and widen exception logging, and expands text_output services.json to list SMB and anonymize parameters and explicit transform properties.

Changes

Node stability and initialization fixes

Layer / File(s) Summary
GLiNER sequential prediction refactor
nodes/src/nodes/anonymize/glinerRecognizer.py
Removes concurrent.futures parallel processing and replaces with sequential iteration over non-empty overlapping text chunks and precomputed label batches. Calls to model.predict_entities run sequentially, results append directly to all_results with progress/debug logging, and final deduplication by (start, end, label) is retained.
smbprotocol dependency loading relocation
nodes/src/nodes/text_output/IGlobal.py, nodes/src/nodes/text_output/endpoint.py
Moves unconditional dependency installation (depends(...) pointing at requirements.txt) to module import in endpoint.py; IGlobal.beginGlobal is simplified to a no-op pass.
Endpoint validation and configuration schema
nodes/src/nodes/text_output/endpoint.py, nodes/src/nodes/text_output/services.json
Endpoint.validateConfig now calls self.connect() when syntaxOnly is False and logs ValueError as error while logging other exceptions as warnings. services.json enumerates SMB configuration fields (smb.storePath, smb.server, smb.username, smb.password) alongside anonymize in text_output.target.parameters and explicitly lists transform properties (type, name, text_output.target.parameters, target.mode).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Threads were tangled, heaps were broken,
Concurrent calls had spoken.
Now GLiNER runs in sequence true,
And smbprotocol loads early too!
From chaos springs a stable dance.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly corresponds to the primary fixes in the changeset: installing smbprotocol at module level and fixing the anonymize crash caused by ThreadPoolExecutor.
Linked Issues check ✅ Passed The PR successfully addresses both objectives from issue #1165: removes ThreadPoolExecutor from glinerRecognizer.predict() to fix heap corruption, and moves depends() to module level in endpoint.py to install smbprotocol unconditionally.
Out of Scope Changes check ✅ Passed All changes directly support the linked objectives. The services.json modifications extend schema definitions for anonymize and SMB configuration fields, which align with improving the text-output module's configuration handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions Bot added the module:nodes Python pipeline nodes label Jun 8, 2026
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown
🤖 Internal: Discord sync marker

Auto-managed by the Discord notification workflow. Stores the linked Discord message ID. Do not edit or delete.

…ze crash

- Move depends() to module level in endpoint.py so smbprotocol is always
  installed regardless of OPEN_MODE.
- Fix STATUS_HEAP_CORRUPTION in anonymize by removing the ThreadPoolExecutor
  from glinerRecognizer.predict(); GLiNER/PyTorch is not thread-safe and
  concurrent calls on the same model corrupt the native heap.

Fixes rocketride-org#1165
@ksaurabhAparavi ksaurabhAparavi force-pushed the fix/RR-1165-anonymize-heap-and-smbprotocol branch from a5e4e7c to b9bd389 Compare June 8, 2026 11:51
@ksaurabhAparavi ksaurabhAparavi changed the title fix(text-output): install smbprotocol at module level and fix anonymize crash (#37) fix(text-output): install smbprotocol at module level and fix anonymize crash Jun 8, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
nodes/src/nodes/text_output/endpoint.py (1)

25-37: ⚠️ Potential issue | 🔴 Critical

Fix import order so depends() runs before SMB imports in nodes/src/nodes/text_output/endpoint.py.

depends(.../requirements.txt) is currently called at line 37, after import smbclient (line 27) and from smbprotocol.exceptions import SMBOSError (line 28), so the module can still fail to import before dependency bootstrap runs—move the depends() call above the SMB imports.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/text_output/endpoint.py` around lines 25 - 37, The bootstrap
call depends(...) must run before importing SMB modules to ensure requirements
are installed first; move the
depends(os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt')
invocation so it appears before any imports of smbclient and from
smbprotocol.exceptions import SMBOSError (i.e., place the depends() call
immediately after importing os and any other stdlib modules it needs), then keep
the smbclient and SMBOSError imports after that change so that the smbclient
import happens only after dependencies have been bootstrapped.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/text_output/endpoint.py`:
- Around line 77-83: The generic Exception handler is swallowing connection
probe failures from self.connect() during validation, allowing invalid SMB
configs to pass; update the except Exception block (around self.connect() in the
validation flow / method that calls self.connect()) to treat connection errors
as failures by logging them with engLib.error (not engLib.warning) and
re-raising the exception (or raising a ValidationError) so validateConfig
actually fails on probe errors instead of continuing silently. Ensure ValueError
handling remains as-is and only broad exceptions are escalated.

---

Outside diff comments:
In `@nodes/src/nodes/text_output/endpoint.py`:
- Around line 25-37: The bootstrap call depends(...) must run before importing
SMB modules to ensure requirements are installed first; move the
depends(os.path.dirname(os.path.realpath(__file__)) + '/requirements.txt')
invocation so it appears before any imports of smbclient and from
smbprotocol.exceptions import SMBOSError (i.e., place the depends() call
immediately after importing os and any other stdlib modules it needs), then keep
the smbclient and SMBOSError imports after that change so that the smbclient
import happens only after dependencies have been bootstrapped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 24a69f13-b4c2-418b-80e6-da0043042ecd

📥 Commits

Reviewing files that changed from the base of the PR and between efecb7e and b9bd389.

📒 Files selected for processing (4)
  • nodes/src/nodes/anonymize/glinerRecognizer.py
  • nodes/src/nodes/text_output/IGlobal.py
  • nodes/src/nodes/text_output/endpoint.py
  • nodes/src/nodes/text_output/services.json

Comment on lines 77 to +83
if not syntaxOnly:
# Do not try to connect as validation is run by Platform
# where SMB share may not be available
# self.connect()
pass
self.connect()

except ValueError as e:
engLib.error(e)
except Exception as e:
engLib.warning(e)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not swallow connection probe failures as warnings in validation.

Line 78 intentionally validates credentials/connectivity via self.connect(), but Line 82-83 catches all exceptions and logs only engLib.warning, which can let invalid SMB config pass validateConfig.

Suggested fix
         except ValueError as e:
             engLib.error(e)
         except Exception as e:
-            engLib.warning(e)
+            engLib.error(e)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not syntaxOnly:
# Do not try to connect as validation is run by Platform
# where SMB share may not be available
# self.connect()
pass
self.connect()
except ValueError as e:
engLib.error(e)
except Exception as e:
engLib.warning(e)
if not syntaxOnly:
self.connect()
except ValueError as e:
engLib.error(e)
except Exception as e:
engLib.error(e)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/text_output/endpoint.py` around lines 77 - 83, The generic
Exception handler is swallowing connection probe failures from self.connect()
during validation, allowing invalid SMB configs to pass; update the except
Exception block (around self.connect() in the validation flow / method that
calls self.connect()) to treat connection errors as failures by logging them
with engLib.error (not engLib.warning) and re-raising the exception (or raising
a ValidationError) so validateConfig actually fails on probe errors instead of
continuing silently. Ensure ValueError handling remains as-is and only broad
exceptions are escalated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:nodes Python pipeline nodes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anonymize node heap corruption (GLiNER thread-safety) + smbprotocol install timing

1 participant