feat(vision): 7-node suite — RF-DETR, Mask2Former, DA-V2, BlazeFace + more by ryan-t-christensen · Pull Request #1081 · rocketride-org/rocketride-server

ryan-t-christensen · 2026-06-03T11:02:24Z

Summary

Ships 7 vision nodes: 3 refactors (detect, detect_segment, depth_estimate) and 4 new (caption, face_detection, pose_estimation, background_removal).
All defaults are Apache-2.0 or MIT — explicit choice for permissive downstream redistribution.
Adds shared dense_resize util + detection loader suite; pins transformers==4.53.3; ships a universal textarea helperText UX fix.

Type

feature (vision suite v1) + refactor + chore

Node Walkthroughs

All walkthroughs below (except face_detection) run against the same input image so outputs are directly comparable.

1. `detect` — Object Detection (refactor)

Run closed-set or open-vocabulary object detection over any input image. Default backend is RF-DETR (Apache-2.0) on COCO-80; flip to MM-Grounding-DINO (Apache-2.0 / BSD-3) when you need to detect arbitrary classes from a free-text prompt. Emits an annotated image plus JSON [{label, score, box, centroid}] on the text lane.

When to use: you need bounding boxes for known COCO classes (RF-DETR) or for novel categories described in natural language (Grounding-DINO).

Before	After

2. `detect_segment` — Segmentation (refactor)

Produce pixel-accurate masks via Mask2Former (MIT) in two profiles: instance (COCO 80-class, default) or semantic (ADE20K 150-class). Image lane returns per-instance / per-class colored mask overlays; text lane returns RLE-encoded masks for compact transport. Replaces the SAM3 backend (gated, required CUDA 12.6+) — single-frame only, uses dense_resize to bound peak memory.

When to use: you need masks (not just boxes) for compositing, measurement, or mask-conditioned downstream nodes.

Before	After

3. `depth_estimate` — Depth Estimation (refactor)

Monocular relative depth via Depth-Anything V2 Small (Apache-2.0). max_edge bounds inference resolution to trade fidelity for latency on large inputs. Image lane emits a colorized depth map; text lane emits {min, max, mean} stats. Simplified to image-only I/O; video lane and Doc emission removed (handled by upstream frame_grabber).

When to use: you need a depth cue for parallax, foreground/background separation, or as a conditioning signal for other nodes.

Before	After

4. `caption` — Image Captioning (new, replaces `describe`)

Generate a natural-language description with Florence-2 Base (MIT) at one of three granularities: caption, detailed_caption, more_detailed_caption. Replaces the prior multi-task describe node — specialized to captioning only, keeping the surface area tight. Emits a caption string on the text lane.

When to use: you need a textual summary of an image to drive search, prompts, alt-text, or downstream LLM reasoning.

(Florence-2 caption rendered as text)

5. `face_detection` — Face Detection (new)

Detect faces with MediaPipe BlazeFace (Apache-2.0), returning axis-aligned bounding boxes plus, optionally, six coarse alignment-grade keypoints per face (eyes, nose tip, mouth center, ear tragions). Image lane emits an annotated frame; text lane emits JSON [{label: 'face', score, box, centroid, landmarks?}]. Keypoints are toggleable via emit_landmarks.

When to use: you need face localization or alignment landmarks for cropping, blurring, or framing.

Before	After

6. `pose_estimation` — Pose Estimation (new)

Top-down 2D human pose estimation via RTMPose through rtmlib (Apache-2.0), with tiny / medium (default) / large profiles. Each detected person yields 17 COCO keypoints; max_persons caps the per-image work. Image lane emits a skeleton overlay; text lane emits [{box, keypoints}].

When to use: you need body keypoints for gesture, motion analysis, sports, or pose-conditioned generation.

Before	After

7. `background_removal` — Background Removal (new)

Cut subjects out of their background with BiRefNet (MIT) in either the default 1024px profile or the HR 2048px profile for fine edges and hair. Image lane returns an RGBA PNG with a soft alpha matte (straight, not premultiplied); text lane returns {mean_alpha, alpha_coverage_pct} so downstream logic can gate on how much subject was actually found.

When to use: you need a transparent subject for compositing, product shots, thumbnails, or as a mask source for other nodes.

Before	After

Cross-Cutting Changes

dense_resize util (packages/ai/.../image/dense_resize.py) — standalone helpers (resize_for_inference, restore_dense_output, restore_rle_mask) for memory-bounded dense inference. Consumed by depth_estimate, detect_segment, background_removal. Keep in mind for future dense-prediction nodes.
Detection model loaders (packages/ai/.../models/detection/detection.py) — RFDetrLoader, MmGDinoLoader, Mask2FormerInstanceLoader, Mask2FormerSemanticLoader. Five unused legacy loaders (Florence2, OWLv2, MobileSAM, SAM2, SAM3 grounded) removed in the same pass — loader registry is now 1:1 with shipped models.
transformers==4.53.3 pin — load-bearing. Newer versions regress on Mask2Former and Depth-Anything-V2. Do not bump without re-validating every vision node (I'm still reviewing this, not a fan)
TextareaWidget helperText + sticky label — universal UX fix. Affects all ~34 textareas across the suite (not just vision); pipes services.json description field into MUI's helperText and pins InputLabelProps.shrink=true so labels always float above the outlined border.
Icon cleanup — tool_python/python.svg minified (265 → 24 lines, no visual change); video_composer/video_composer.svg added (was missing from prior commit).

Testing

Tested locally
All 7 vision nodes exercised end-to-end via local pipeline at /Users/ryan/Desktop/skyrim/test.pipe during development; each loaded, processed frames, and emitted expected outputs.
Pre-commit hooks (ruff, gitleaks) pass on every commit in this branch.
Tests added or updated — no automated test coverage added; tracked as follow-up
./builder test passes — not run; reviewers please confirm

Reviewers, please probe

First-run model downloads — pulls are cache-miss on a fresh machine; confirm each node's loader handles cold cache without hanging or partial-write corruption.
GPU vs CPU device fallback — author tested on a single device class. Try a CPU-only box and a CUDA box to confirm device selection degrades gracefully.
max_edge memory bounds — push a 4K+ input and a very small input through depth_estimate, detect_segment, and background_removal to confirm resize/clamp holds at both ends.
face_detection output shape — run an image with at least one clear face; confirm the text lane emits [{label: 'face', score, box, centroid}] and the image lane returns an annotated frame. Toggle Emit 6 alignment keypoints off and confirm landmarks is omitted, on and confirm six named points per face. Frames with no detectable face should pass through without error.
UI sanity — rebuild the UI bundle, open the Add Node panel, confirm description text now renders below textarea fields and no console warnings about unknown props.

Checklist

Commit messages follow conventional commits
No secrets or credentials included
Wiki updated (if applicable) — N/A
Breaking changes documented (if applicable) — see below

Breaking Changes

describe node removed. Pipelines referencing describe must migrate to either caption (captioning only) or the appropriate per-task node (detect, detect_segment, etc.). No automatic migration path — node IDs in saved graphs will fail to resolve and need manual replacement.

Notes for Reviewers

Permissive licenses by default. Every default checkpoint is Apache-2.0 or MIT — deliberate choice for downstream redistribution. Non-permissive variants would be opt-in only (none currently shipped).
Refactors are behavior-preserving on the regression fixtures. detect, detect_segment, depth_estimate were restructured around the new loaders but produce the same outputs.
Review order suggestion: prereqs (dense_resize, loaders) → refactors → new nodes → chores. Loaders + util are tiny; review them first to establish the contract used by everything else.
describe deletion is intentional. Per-task work has its own dedicated node now; caption is the captioning-only successor.
face_detection "privacy guard" removed. An earlier draft of this PR described a structural guard (chains_to_embedding: false plus a runtime block on downstream embedding_* nodes). That mechanism was non-functional — nothing in the framework read the flag, and the runtime check probed downstream-node accessors that rocketlib's endpoint never exposes, so it never fired. It's been removed rather than left as dead code implying a guarantee we don't provide; face_detection now behaves like every other vision node.

Linked Issue

N/A — direct feature work; no tracking issue exists for this branch.

Summary by CodeRabbit

New Features
- Added many vision nodes: background removal, object detection, segmentation, face detection, pose estimation, depth estimation, captioning, and a Video Composer for MP4 output. Each node emits annotated images and/or JSON text outputs.
Improvements
- UI textarea helper text fallback; longer WebSocket ping timeout; client supports async context manager.
Tests
- New automated tests validating vision nodes and native-library handling.

coderabbitai · 2026-06-03T11:02:31Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds dense-output helpers, many vision model facades and nodes (background_removal, caption, depth_estimate, detect, detect_segment, face_detection, pose_estimation, video_composer), per-node globals/instances, test-framework shared-lib detection, and client/UI/runtime polish.

Changes

Vision nodes and shared ML infrastructure

Layer / File(s)	Summary
Shared runtime and exports `packages/ai/src/ai/common/image/dense_resize.py`, `packages/ai/src/ai/common/models/__init__.py`, `packages/ai/src/ai/common/models/base.py`, `packages/ai/src/ai/common/models/transformers/requirements_transformers.txt`	Adds dense-output helpers, expands vision model exports, widens BaseLoader to accept multiple requirements files, and pins `transformers` version.
Detection model stack `packages/ai/src/ai/common/models/vision/detection.py`	Adds RFDetr and MmGDino backends, DetectorLoader orchestration, and Detector facade with proxy/local execution.
Background removal node `packages/ai/src/ai/common/models/vision/background.py`, `nodes/src/nodes/background_removal/`	BiRefNet loader and BackgroundRemover facade; IGlobal lifecycle, IInstance per-frame processing, service definition and tests.
Caption node `packages/ai/src/ai/common/models/vision/caption.py`, `nodes/src/nodes/caption/`	Florence-2 caption loader/facade and node with per-frame handling, timeout watchdog, and service/tests.
Depth estimation node `packages/ai/src/ai/common/models/vision/depth.py`, `nodes/src/nodes/depth_estimate/`	HuggingFace depth pipeline loader/facade, dense restore, colorization, stats emission, and node service/tests.
Object detection node `packages/ai/src/ai/common/models/vision/detection.py`, `nodes/src/nodes/detect/`	Per-frame detection, annotated image output, configurable backends/prompts, and tests.
Segmentation node `nodes/src/nodes/detect_segment/`, `nodes/src/nodes/detect_segment/requirements.txt`	Mask2Former instance/semantic modes, RLE handling, class-map decoding, node lifecycle and service/tests; adds `pycocotools` requirement.
Face detection node `nodes/src/nodes/face_detection/`, `nodes/src/nodes/face_detection/face_detection.py`	MediaPipe BlazeFace wrapper with model caching, missing-library error translation, IGlobal/IInstance and unit tests.
Pose estimation node `nodes/src/nodes/pose_estimation/`	RTMPose integration with threshold filtering, skeleton rendering, and node service/tests.
Video composer & response node `nodes/src/nodes/video_composer/`, `nodes/src/nodes/response/IInstance.py`	FFmpeg image2pipe MP4 encoding from frames, chunked SSE/video output, and response node base64 video accumulation.
Client & UI polish `packages/client-python/src/rocketride/`, `packages/shared-ui/.../TextareaWidget.tsx`, `docker/Dockerfile.engine`	Async client context manager, increased WS ping timeout, textarea helper-text fallback and label shrink, and added libgles/libegl in runtime image.
Test framework enhancements `nodes/test/conftest.py`, `nodes/test/framework/discovery.py`, `nodes/test/face_detection/`	Platform-specific shared-library resolution, pytest.param skip marks, pipeline config merging, and face-detection unit tests.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

rocketride-org/rocketride-server#243: Related changes to vision model exports and packaging.

Suggested labels

module:nodes, module:ai, module:ui

Suggested reviewers

jmaionchi
stepmikhaylov
Rod-Christensen

Poem

🐰 Nine models hop through data streams,
Locks hold the GPU while the pipeline gleams.
Masks and captions, depths and pose,
Frames stitched to MP4 where the river flows.
A rabbit cheers: “New vision dreams!”

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/vision

github-actions · 2026-06-03T11:02:44Z

No description provided.

coderabbitai

Actionable comments posted: 17

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/background_removal/background_removal.py`:
- Around line 67-71: The parsed maxEdge value assigned to self._max_edge must be
clamped to allowed bounds to prevent negative or huge values; after parsing the
int from config.get('maxEdge', 1024) (and handling TypeError/ValueError as you
already do), clamp it to the service contract limits (e.g., min_edge = 16 and
max_edge = 4096) before assignment (self._max_edge = max(min_edge,
min(parsed_value, max_edge))); keep the existing fallback to 1024 on parse error
and optionally log when a value is out of bounds for visibility.

In `@nodes/src/nodes/caption/IInstance.py`:
- Around line 37-45: The decode call
ImageProcessor.load_image_from_bytes(self._image_data) is currently outside the
try/except, so malformed frame bytes can raise and crash the END handler; wrap
the image decoding together with the caption inference in the same try/except
that logs a warning (including self._chunk_id and the exception) and sets
caption_text = '' on failure; specifically move or include
ImageProcessor.load_image_from_bytes(...) inside the try block that uses
self.IGlobal.device_lock and self.IGlobal.captioner.caption so both decode and
caption errors are handled the same for AVI_ACTION.END.

In `@nodes/src/nodes/depth_estimate/depth_estimate.py`:
- Around line 48-51: The config value assigned to self._max_edge is currently
accepted as any int; update the initialization in depth_estimate.py to validate
and clamp the parsed integer to a safe range (e.g., min 1 and a sensible upper
bound such as 4096) before storing it so negative/zero/oversized values cannot
reach resize_for_inference; specifically, after parsing in the try/except that
sets self._max_edge, wrap the value with a clamp (e.g., self._max_edge =
max(MIN, min(parsed_value, MAX))) and keep the fallback to the default on
TypeError/ValueError, and ensure any later use in resize_for_inference
references this clamped self._max_edge.

In `@nodes/src/nodes/detect_segment/IInstance.py`:
- Around line 217-227: The END-path frame processing must be guarded so a single
bad frame doesn't crash the node: wrap the ImageProcessor.load_image_from_bytes
+ device-locked call to self.IGlobal.segmenter.segment and the subsequent
self._emit in a try/except that catches any Exception, logs a warning (use
self.logger if present, otherwise self.IGlobal.logger) including the chunk id
and error, then drop the frame by clearing self._image_data and incrementing
self._chunk_id before returning self.preventDefault(); only on success proceed
to emit and then clear/increment as now. Ensure exceptions are not re-raised so
the stream continues.

In `@nodes/src/nodes/detect/detect.py`:
- Around line 65-69: Remove the debug-only warning call that prints internal
config keys in detect.py: delete the warning(...) call (the one constructing
f'detect: __init__ engine={self.engine!r} prompt={self.prompt!r}
config_keys=...') inside the Detect class initializer (or module-level __init__
routine) so production logs no longer expose internal config; if you need
retainable debug visibility, replace it with a logger.debug guarded by a feature
flag or environment check (e.g., use logger.debug in Detect.__init__ behind an
explicit DEBUG flag) rather than emitting a warning.

In `@nodes/src/nodes/detect/services.json`:
- Around line 87-90: The test entry in services.json uses a 60s timeout which is
too short for first-run model downloads/warmups; update the "test" object's
"timeout" value (under the "test" key for the "rfdetr" profile/outputs) to a
larger value (e.g., 300 seconds) to avoid flaky cold-start failures and ensure
the built-in service test has enough time for initial model download and
initialization.

In `@nodes/src/nodes/face_detection/face_detection.py`:
- Around line 128-129: The download call using
urllib.request.urlretrieve(self.model_url, tmp_path) can hang because
urlretrieve has no timeout; replace it with an explicit timed HTTP read (e.g.
call urllib.request.urlopen(self.model_url, timeout=... ) and stream the
response to tmp_path, then os.replace(tmp_path, local_path) as before), and add
proper exception handling around the fetch so a timeout raises and is
logged/propagated. Update the download logic in the FaceDetection class / the
model download method where tmp_path and local_path are used.

In `@nodes/src/nodes/face_detection/IGlobal.py`:
- Around line 117-123: The accessor normalization currently does "if value:" and
then "list(value)", which treats empty containers as missing and splits
dicts/strings into keys/characters causing _looks_biometric(...) to miss
embedding nodes; change the presence check to "if getattr(endpoint, attr, None)
is not None" and normalize safely: if value is a dict or a str/bytes or not an
Iterable, return [value]; otherwise return list(value) (allowing empty lists to
propagate). Update the code around getattr(endpoint, attr, None) and the
normalization logic (referencing the value variable and the _looks_biometric use
sites) and import collections.abc.Iterable as needed.

In `@nodes/src/nodes/face_detection/IInstance.py`:
- Around line 76-86: Wrap the frame decoding/detection/emission logic in the
AVI_ACTION.END branch with a try/except so a single bad frame doesn't abort the
stream: call ImageProcessor.load_image_from_bytes, then inside the with
self.IGlobal.device_lock call self.IGlobal.detector.detect and self._emit within
the same try block, catch Exception, log a warning (include the exception and
the current self._chunk_id) and drop the frame; in both success and failure
paths ensure you clear self._image_data, increment self._chunk_id, and return
self.preventDefault() so the pipeline continues.

In `@nodes/src/nodes/pose_estimation/IInstance.py`:
- Around line 108-114: Wrap the END-frame processing in the AVI_ACTION.END
branch (the sequence calling ImageProcessor.load_image_from_bytes,
self._estimate and self._emit) in a try/except that catches any exception from
decode/inference/emit, logs a warning with the error and context (e.g., chunk
id), clears self._image_data and increments self._chunk_id to drop the bad
frame, then returns self.preventDefault() so the stream continues instead of
crashing; retain normal behavior on success (emit then clear/increment/return).

In `@nodes/src/nodes/pose_estimation/pose_estimation.py`:
- Around line 200-249: The code assumes scores_arr has >= n_kpts columns which
can cause IndexError when accessing scs[k_idx]; modify the logic in the
pose-building loop to clamp the number of score columns and provide a safe
default for missing scores: compute e.g. n_score_cols = min(n_kpts,
scores_arr.shape[1]), slice scs = scores_arr[idx, :n_score_cols], and when
iterating k_idx in range(n_kpts) use the sliced scs value if k_idx <
n_score_cols else a default (0.0 or float('nan')) for the 'score' and for
visibility checks (visible = scs >= self.threshold should use a padded array or
per-index conditional). Update uses of n_kpts (person_scores, keypoints_list,
visible) accordingly while keeping COCO_17_KEYPOINTS, keypoints_arr, scores_arr,
n_kpts, scs, keypoints_list, and self.threshold as the referenced symbols to
locate changes.

In `@nodes/src/nodes/video_composer/IGlobal.py`:
- Around line 41-43: The RuntimeError currently embeds the full connection
config (self.glb.connConfig) which may leak secrets; change the error to avoid
including raw connConfig—use only non-sensitive identifiers (e.g.,
self.glb.logicalType and optionally a sanitized list of keys or a
redacted/hashed version) or omit connConfig entirely in the message. Locate the
block that raises the error in IGlobal (the check for self.config is None that
references self.glb.logicalType and self.glb.connConfig) and replace the
exception text so it no longer prints the full connConfig but still provides
enough context for debugging (e.g., include logicalType and a redacted/keys-only
summary).

In `@nodes/src/nodes/video_composer/IInstance.py`:
- Around line 86-89: Normalize and sanitize numeric config fields before
validation: when reading cfg.get('fps', 1.0) and cfg.get('crf', 23) in the
IInstance initialization, coerce fps to a float and crf to an int using safe
parsing (handling ValueError/TypeError) and fall back to the default values if
parsing fails, then run the existing range checks against these normalized
self._fps and self._crf; ensure any parsing errors produce a clear configuration
error rather than letting TypeError/ValueError bubble up from the later range
checks.
- Around line 47-53: _plog currently performs unguarded filesystem writes which
can raise OSError and crash the node; wrap the file open/write inside a
try/except that catches OSError (or Exception) to prevent propagation, and on
failure fall back to a non-crashing alternative (e.g., write a minimal message
to sys.stderr or simply return) so logging failures don’t break request
handling; update the _plog function and reference the _PLOG variable and
_plog(msg: str) to implement this guarded write behavior.
- Around line 250-257: The sendSSE call can raise transport errors and currently
can abort the whole chunk loop; wrap the self.instance.sendSSE(...) invocation
in a try/except around the block that sends each chunk (the code that references
self.instance.sendSSE, self._filename, chunk_index, total_chunks, etc.), catch
broad exceptions, log the failure (include chunk_index and filename) via the
instance or node logger, optionally perform a small retry, and then continue the
loop so a single SSE failure does not stop streaming the remaining chunks.

In `@packages/client-python/src/rocketride/core/constants.py`:
- Line 84: CONST_WS_PING_TIMEOUT was increased to 600s which delays
dead-connection detection; make this value configurable rather than a hard
constant by adding a configurable parameter that can be passed into the
WebSocket/client constructor or pipeline setup (e.g., ws_ping_timeout kwarg) and
default to CONST_WS_PING_TIMEOUT; update places that currently import or
reference CONST_WS_PING_TIMEOUT to prefer the instance-level setting (falling
back to the constant) and document the new parameter so operators can tune it
per-deployment or per-pipeline and monitor server-side connection pool metrics
for leaks.

In
`@packages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx`:
- Line 94: The helperText assignment in TextareaWidget.tsx uses an unsafe cast
(options?.description as string); change it to a safe runtime-safe expression by
either removing the assertion and relying on TypeScript inference or
wrapping/guarding the value (e.g., check typeof options?.description ===
"string" and use it, otherwise fall back to schema?.description or use
String(options?.description) if coercion is desired). Update the expression used
for helperText in the TextareaWidget component so it no longer uses the "as
string" assertion and instead performs a proper type guard or conversion.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: d9d9511f-c359-41f8-a156-8d22b26ad2f0

📥 Commits

Reviewing files that changed from the base of the PR and between c3b7720 and 93f51e3.

⛔ Files ignored due to path filters (9)

nodes/src/nodes/background_removal/background-removal.svg is excluded by !**/*.svg
nodes/src/nodes/caption/caption.svg is excluded by !**/*.svg
nodes/src/nodes/depth_estimate/depth-estimate.svg is excluded by !**/*.svg
nodes/src/nodes/detect/detect.svg is excluded by !**/*.svg
nodes/src/nodes/detect_segment/segmentation.svg is excluded by !**/*.svg
nodes/src/nodes/face_detection/face-detection.svg is excluded by !**/*.svg
nodes/src/nodes/pose_estimation/pose-estimation.svg is excluded by !**/*.svg
nodes/src/nodes/tool_python/python.svg is excluded by !**/*.svg
nodes/src/nodes/video_composer/video_composer.svg is excluded by !**/*.svg

📒 Files selected for processing (58)

nodes/src/nodes/background_removal/IGlobal.py
nodes/src/nodes/background_removal/IInstance.py
nodes/src/nodes/background_removal/__init__.py
nodes/src/nodes/background_removal/background_removal.py
nodes/src/nodes/background_removal/requirements.txt
nodes/src/nodes/background_removal/services.json
nodes/src/nodes/caption/IGlobal.py
nodes/src/nodes/caption/IInstance.py
nodes/src/nodes/caption/__init__.py
nodes/src/nodes/caption/caption.py
nodes/src/nodes/caption/requirements.txt
nodes/src/nodes/caption/services.json
nodes/src/nodes/depth_estimate/IGlobal.py
nodes/src/nodes/depth_estimate/IInstance.py
nodes/src/nodes/depth_estimate/__init__.py
nodes/src/nodes/depth_estimate/depth_estimate.py
nodes/src/nodes/depth_estimate/requirements.txt
nodes/src/nodes/depth_estimate/services.json
nodes/src/nodes/detect/IGlobal.py
nodes/src/nodes/detect/IInstance.py
nodes/src/nodes/detect/__init__.py
nodes/src/nodes/detect/detect.py
nodes/src/nodes/detect/requirements.txt
nodes/src/nodes/detect/services.json
nodes/src/nodes/detect_segment/IGlobal.py
nodes/src/nodes/detect_segment/IInstance.py
nodes/src/nodes/detect_segment/__init__.py
nodes/src/nodes/detect_segment/detect_segment.py
nodes/src/nodes/detect_segment/requirements.txt
nodes/src/nodes/detect_segment/services.json
nodes/src/nodes/face_detection/IGlobal.py
nodes/src/nodes/face_detection/IInstance.py
nodes/src/nodes/face_detection/__init__.py
nodes/src/nodes/face_detection/face_detection.py
nodes/src/nodes/face_detection/requirements.txt
nodes/src/nodes/face_detection/services.json
nodes/src/nodes/pose_estimation/IGlobal.py
nodes/src/nodes/pose_estimation/IInstance.py
nodes/src/nodes/pose_estimation/__init__.py
nodes/src/nodes/pose_estimation/pose_estimation.py
nodes/src/nodes/pose_estimation/requirements.txt
nodes/src/nodes/pose_estimation/services.json
nodes/src/nodes/response/IInstance.py
nodes/src/nodes/video_composer/IGlobal.py
nodes/src/nodes/video_composer/IInstance.py
nodes/src/nodes/video_composer/__init__.py
nodes/src/nodes/video_composer/services.json
package.json
packages/ai/src/ai/common/embedding.py
packages/ai/src/ai/common/image/dense_resize.py
packages/ai/src/ai/common/models/__init__.py
packages/ai/src/ai/common/models/detection/__init__.py
packages/ai/src/ai/common/models/detection/detection.py
packages/ai/src/ai/common/models/detection/requirements_detection.txt
packages/ai/src/ai/common/models/transformers/requirements_transformers.txt
packages/client-python/src/rocketride/client.py
packages/client-python/src/rocketride/core/constants.py
packages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx

coderabbitai · 2026-06-03T11:22:12Z

 # If no pong response is received within this period after a ping,
 # the connection is considered dead and will be closed
-CONST_WS_PING_TIMEOUT = 60
+CONST_WS_PING_TIMEOUT = 600


🧹 Nitpick | 🔵 Trivial | 💤 Low value

10-minute ping timeout trades faster dead-connection detection for long-running inference support.

The 10x increase from 60s to 600s accommodates the heavy vision processing added in this PR (depth estimation, segmentation, pose, etc.), which may legitimately take minutes per frame on CPU or with large images. However, truly dead connections will now linger for up to 10 minutes before being detected and closed.

Consider monitoring server-side connection pool metrics after deployment to ensure abandoned connections don't accumulate. If resource leaks become an issue, you might want to expose this as a configurable parameter (e.g., constructor kwarg or per-pipeline timeout override) rather than a single global constant.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/client-python/src/rocketride/core/constants.py` at line 84, CONST_WS_PING_TIMEOUT was increased to 600s which delays dead-connection detection; make this value configurable rather than a hard constant by adding a configurable parameter that can be passed into the WebSocket/client constructor or pipeline setup (e.g., ws_ping_timeout kwarg) and default to CONST_WS_PING_TIMEOUT; update places that currently import or reference CONST_WS_PING_TIMEOUT to prefer the instance-level setting (falling back to the constant) and document the new parameter so operators can tune it per-deployment or per-pipeline and monitor server-side connection pool metrics for leaks.

- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

coderabbitai

♻️ Duplicate comments (2)

nodes/src/nodes/video_composer/IInstance.py (1)

259-275: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard the final video_complete SSE as well.

The chunk sends are now best-effort, but Line 274 still does a bare sendSSE(...). A disconnected client can still raise there and fail close() after the MP4 has already been written downstream.

Suggested patch

-        self.instance.sendSSE('video_complete', filename=self._filename)
+        try:
+            self.instance.sendSSE('video_complete', filename=self._filename)
+        except Exception as e:
+            _log(f'_output_video: sendSSE failed for video_complete filename={self._filename}: {e}')
         _plog(f'output_video: done -- sent {total_chunks} SSE chunks filename={self._filename!r}')

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/video_composer/IInstance.py` around lines 259 - 275, Wrap the
final self.instance.sendSSE('video_complete', filename=self._filename) in the
same defensive try/except used for chunk sends so a disconnected client doesn't
raise and prevent cleanup; specifically, after calling
self.instance.writeVideo(AVI_ACTION.END, 'video/mp4', b'') call sendSSE inside a
try block, catch Exception as e and log the failure (use _log or _plog with
context including filename and the exception) but do not re-raise — keep
writeVideo and the final _plog that reports sent chunks unchanged so the MP4
close path always completes even if sendSSE fails.

nodes/src/nodes/pose_estimation/pose_estimation.py (1)

205-214: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don’t truncate the emitted COCO-17 keypoint list when score columns are short.

Clamping n_kpts against scores_arr.shape[1] avoids the IndexError, but it also makes a (N, 17, 2) keypoint result emit fewer than 17 keypoints whenever the score tensor is shorter. That breaks the node’s documented COCO-17 output contract instead of degrading scores only. Keep n_kpts based on the keypoint tensor / label list, slice the available score columns separately, and default any missing scores to 0.0.

Suggested patch

-        n_persons = keypoints_arr.shape[0]
-        # Clamp against score columns and the COCO label count so neither
-        # scs[k_idx] nor COCO_17_KEYPOINTS[k_idx] can raise IndexError.
-        n_kpts = min(keypoints_arr.shape[1], scores_arr.shape[1], len(COCO_17_KEYPOINTS))
-        if n_kpts == 0:
+        n_persons = keypoints_arr.shape[0]
+        n_kpts = min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS))
+        n_score_cols = min(scores_arr.shape[1], n_kpts)
+        if n_kpts == 0:
             return []
-        if keypoints_arr.shape[1] != len(COCO_17_KEYPOINTS):
+        if n_kpts != len(COCO_17_KEYPOINTS) or n_score_cols != n_kpts:
             warning(
-                f'pose_estimation: unexpected keypoint count {keypoints_arr.shape[1]}, expected {len(COCO_17_KEYPOINTS)}'
+                'pose_estimation: unexpected keypoint/score count '
+                f'({keypoints_arr.shape[1]} keypoints, {scores_arr.shape[1]} scores), '
+                f'expected {len(COCO_17_KEYPOINTS)}'
             )
@@
-        person_scores = scores_arr[:, :n_kpts].mean(axis=1)
+        if n_score_cols > 0:
+            person_scores = scores_arr[:, :n_score_cols].mean(axis=1)
+        else:
+            person_scores = np.zeros(n_persons, dtype=float)
@@
-            scs = scores_arr[idx, :n_kpts]
+            scs = np.zeros(n_kpts, dtype=float)
+            if n_score_cols > 0:
+                scs[:n_score_cols] = scores_arr[idx, :n_score_cols]

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@nodes/src/nodes/pose_estimation/pose_estimation.py` around lines 205 - 214,
The code currently sets n_kpts using scores_arr and truncates keypoints output;
instead compute n_kpts as min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS)) so
the emitted keypoint array preserves up to COCO_17_KEYPOINTS, then separately
slice scores_arr (e.g., available_scores = scores_arr[:, :n_score_cols]) and
when mapping scores into the COCO-17 slots fill any missing score columns with
0.0 to avoid IndexError; update references around n_kpts, keypoints_arr,
scores_arr and COCO_17_KEYPOINTS so keypoints are not dropped while scores are
safely bounded/padded.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@nodes/src/nodes/pose_estimation/pose_estimation.py`:
- Around line 205-214: The code currently sets n_kpts using scores_arr and
truncates keypoints output; instead compute n_kpts as
min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS)) so the emitted keypoint
array preserves up to COCO_17_KEYPOINTS, then separately slice scores_arr (e.g.,
available_scores = scores_arr[:, :n_score_cols]) and when mapping scores into
the COCO-17 slots fill any missing score columns with 0.0 to avoid IndexError;
update references around n_kpts, keypoints_arr, scores_arr and COCO_17_KEYPOINTS
so keypoints are not dropped while scores are safely bounded/padded.

In `@nodes/src/nodes/video_composer/IInstance.py`:
- Around line 259-275: Wrap the final self.instance.sendSSE('video_complete',
filename=self._filename) in the same defensive try/except used for chunk sends
so a disconnected client doesn't raise and prevent cleanup; specifically, after
calling self.instance.writeVideo(AVI_ACTION.END, 'video/mp4', b'') call sendSSE
inside a try block, catch Exception as e and log the failure (use _log or _plog
with context including filename and the exception) but do not re-raise — keep
writeVideo and the final _plog that reports sent chunks unchanged so the MP4
close path always completes even if sendSSE fails.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0338639a-1f6e-4f38-8f93-8f12253ab6c8

📥 Commits

Reviewing files that changed from the base of the PR and between 93f51e3 and e18d357.

📒 Files selected for processing (14)

nodes/src/nodes/background_removal/background_removal.py
nodes/src/nodes/caption/IInstance.py
nodes/src/nodes/depth_estimate/depth_estimate.py
nodes/src/nodes/detect/detect.py
nodes/src/nodes/detect/services.json
nodes/src/nodes/detect_segment/IInstance.py
nodes/src/nodes/face_detection/IGlobal.py
nodes/src/nodes/face_detection/IInstance.py
nodes/src/nodes/face_detection/face_detection.py
nodes/src/nodes/pose_estimation/IInstance.py
nodes/src/nodes/pose_estimation/pose_estimation.py
nodes/src/nodes/video_composer/IGlobal.py
nodes/src/nodes/video_composer/IInstance.py
packages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx

💤 Files with no reviewable changes (1)

nodes/src/nodes/detect/detect.py

- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Standalone helpers (resize_for_inference, restore_dense_output, restore_rle_mask, default_max_edge) that bound large frames to a max edge before model inference and lift dense outputs back to the source resolution. Prereq for depth_estimate, detect_segment, and background_removal nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ormer) Loaders for RF-DETR (Apache-2.0), MM-Grounding-DINO, and Mask2Former instance/semantic via HuggingFace transformers, replacing the prior SAM3-only DetectionLoader. Prereq for the detect and detect_segment nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the legacy YOLO-World backend with RF-DETR (default, COCO 80-class, Apache-2.0) and MM-Grounding-DINO for open-vocabulary prompt-driven detection. Collapses I/O to a single image lane — drops the video lane and Doc emission since dedicated nodes now cover those flows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Drop the gated SAM3 backend (required CUDA 12.6+) in favor of Mask2Former with instance and semantic modes. Single-frame only — video lane removed — and uses dense_resize to bound peak memory on large inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Simplify the node to a single image lane on Depth-Anything V2 Small and add a max_edge param so inference is bounded by resolution rather than the raw frame size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Remove the multi-task describe node (caption/detect/grounding/OCR) and replace with a caption-only Florence-2 Base node exposing the three granularity levels. Detection, grounding, and OCR are now owned by their dedicated nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BlazeFace short-range via MediaPipe (Apache-2.0), boxes only — no embeddings emitted. Privacy guards baked in: chains_to_embedding=false in services.json plus a runtime block on any embedding_* downstream node so face crops cannot flow into identification pipelines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ONNX-backed RTMPose with tiny/medium/large profiles, returning the 17 COCO keypoints per person. Exposes max_persons and detection threshold as configurable fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BiRefNet (MIT) with a default 1024px profile and an HR 2048px profile. Emits an RGBA cutout PNG plus alpha-coverage stats for downstream QA/thresholding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

TextareaWidget now pipes the services.json description field through to MUI's helperText, and pins InputLabelProps.shrink=true so the field title always floats above the outlined border instead of clipping through it on empty/multiline fields. Applies uniformly to all ~34 textareas across the suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Newer transformers releases regress the vision models used in this suite (RF-DETR, Mask2Former, Florence-2, Depth-Anything V2). Pin in the shared requirements_transformers.txt, document the why inline in embedding.py, and bump the package patch (3.2.0 -> 3.2.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Icon was omitted from the initial video_composer commit (53c595b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The chains_to_embedding flag was inert metadata (read by nothing in the framework) and the runtime _check_no_biometric_downstream guard never fired: it probed downstream-node accessors that the rocketlib endpoint never exposes, so it always fell through to a warning and allowed the pipeline. Remove the dead guard code, the chains_to_embedding flag, and the biometric/BIPA-GDPR framing so the node behaves like every other vision node — boxes and alignment landmarks, no special downstream restriction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ce test - remove `_pick_device()` (torch import) and the unused `self.device` field; mediapipe runs on CPU and the value was never read at inference - test: detect a real face in a public-domain portrait (`testdata/images/einstein.jpg`, ~0.80 confidence, deterministic) instead of feeding a text image (`ocr_test_text.png`) with no faces

- AI models: `DepthEstimatorLoader` + `DepthEstimator` facade (local/proxy); node uses the facade, no in-process pipeline - shared `ai.common.utils`: `image_utils` + `cuda_utils` - node-test: default `ROCKETRIDE_APIKEY` so dynamic tests run;

refactor(depends): add engine/model cache-dir helpers Replace `_get_cache_dir` and hardcoded `./cache/...` paths with `engine_cache_dir`, `model_cache_dir`, and combined/constraints path helpers. Adds unit tests. @

Move model logic into `ai.common.models.vision` loaders + facades (`DepthEstimator`/`Detector`/`Segmenter`); nodes keep config parsing and rendering only. Centralize backend/mode/threshold/max_edge constants in the loaders, share a base+extras requirements split via `_REQUIREMENTS_FILE`, and cache weights under `model_cache_dir`. Add detection/segmentation unit tests, services.json groups, and skip_nodes entries.

Move caption (Florence-2) and background_removal (BiRefNet) into `ai.common.models.vision` loaders + facades; nodes keep config + rendering only. Run BiRefNet in float32 (fixes the bf16 `deformable_im2col` crash). Add unit tests, services.json groups, and skip_nodes entries.

- per-frame try/except in depth/detect/background_removal IInstance (drop frame + warn), matching caption/detect_segment - restore caption local-mode inference watchdog (60s) dropped in conversion - pin model revisions: thread revision through detect/segment backends and pin SHAs in depth/detect/detect_segment/caption/background_removal profiles - fulltest covers every profile (test = default only); add face_detection fulltest; add per-group test `config` so mmgdino supplies its prompt instead of a baked profile default

Move rtmlib pose into `ai.common.models.vision.pose` — torch-free PoseEstimatorLoader + PoseEstimator (device via onnxruntime providers). Node keeps config + skeleton rendering only; `mode` is identity, threshold/max_persons per-request. services.json test=default, fulltest=all 3 profiles; pose moved to the heavy skip_nodes group. Add test_pose.py. Also fix caption fp16 crash on CUDA: cast pixel_values to the model dtype so Florence-2's conv doesn't get float input vs half weights.

The develop merge (#1196) added excludes.txt handling calling _get_cache_dir(), which feat/vision had renamed to engine_cache_dir() -> Ruff F821. Rename both call sites; fix the stale contract_checks comment.

Importing mediapipe pulls in matplotlib.pyplot, which aborts the embedded engine during global init (seen as a misleading "Task has already completed"). Fixing that exposes mediapipe's libGLESv2.so.2 dependency, absent on headless hosts. - stub matplotlib.pyplot before importing mediapipe; re-raise the libGLESv2 load failure with an install hint - add OS-keyed `requiresLibs` test config: skip with a reason when a system lib is missing instead of hard-failing - install libgles2 in compiler-unix.sh and Dockerfile.engine - add unit tests; enable pytest -ra for visible skip reasons

…check CI surfaced two follow-ups to the initial fix: - MediaPipe needs both libGLESv2.so.2 and libEGL.so.1; only the former was handled, so the node test failed on CI once libGLESv2 was installed. Add libegl1 to compiler-unix.sh and Dockerfile.engine, and to requiresLibs. - check-externals imports mediapipe directly to verify its contract, which pulls matplotlib and aborts the engine's FreeType (the node's pyplot stub doesn't apply there). Mark the mediapipe imports contract-check: ignore.

coderabbitai

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@nodes/src/nodes/depth_estimate/IGlobal.py`:
- Around line 48-60: The code always passes DEFAULT_MODEL to DepthEstimator,
ignoring any per-profile model configured in node_cfg; change the DepthEstimator
instantiation in IGlobal to use the configured model id (e.g. model_id =
(node_cfg.get('model') or DEFAULT_MODEL).strip() or DEFAULT_MODEL) instead of
DEFAULT_MODEL so profile settings take effect, and keep using the existing
revision variable when constructing DepthEstimator (DepthEstimator(...,
device=None, revision=revision)); ensure you handle missing or non-string values
with the same fallback logic used for max_edge.

In `@nodes/src/nodes/depth_estimate/services.json`:
- Line 10: Update the "description" field in services.json to correct the output
contract: state that depth statistics (min, max, mean) are emitted on the "text"
lane as JSON rather than being attached to the output document's metadata;
locate and edit the "description" property in
nodes/src/nodes/depth_estimate/services.json and replace the sentence
referencing metadata with one that clearly says the stats are returned as a JSON
string on the "text" lane (mention "colorized depth map" and supported runtimes
can remain unchanged).

In `@nodes/src/nodes/detect_segment/IGlobal.py`:
- Around line 63-69: After parsing threshold and max_edge in IGlobal (variables
threshold and max_edge, using DEFAULT_THRESHOLD and DEFAULT_MAX_EDGE), clamp
their values to supported bounds: force threshold into a valid range (e.g.,
between 0.0 and 1.0) and force max_edge to a sensible positive range (e.g., at
least 1 and at most DEFAULT_MAX_EDGE) using min/max or explicit checks; replace
the current blind casts with try/except followed by clamping so downstream
resize/inference cannot receive invalid values.
- Around line 80-83: In endGlobal(), calling self.segmenter.disconnect() can
raise and abort shutdown; make teardown best-effort by wrapping the
disconnect/close call for self.segmenter in a try/except (catch broad Exception)
so exceptions are logged/ignored and do not stop teardown, then always set
self.segmenter = None and self.device_lock = None (use finally or ensure
post-except cleanup) similar to other vision nodes' guarded close/disconnect
patterns; refer to the endGlobal method and
self.segmenter.disconnect()/self.segmenter to locate the change.

In `@nodes/src/nodes/detect_segment/IInstance.py`:
- Around line 138-159: Validate and bound-check the semantic dimensions and the
decoded class_map before any decompression or large allocation: ensure size, h
and w derived from semantic.get('size') are integers within safe limits
(positive and below a MAX_PIXELS or MAX_DIM constant) and fall back to
image.height/width only after validation; before calling
zlib.decompress/base64.b64decode check the class_map_b64 length and
estimate/uncompress size to avoid huge outputs and only proceed if the expected
raw.size equals h*w and h*w <= MAX_PIXELS; likewise validate the result of
_decode_rle_to_mask (and the presence/shape of semantic['semantic_map']) before
using it and before allocating color_layer so you never reshape or allocate
arrays using untrusted h/w values.

In `@nodes/src/nodes/detect/IGlobal.py`:
- Around line 57-60: After parsing threshold from config.get into threshold (and
catching TypeError/ValueError), validate that the parsed value is finite and
within [0,1]; if not, fall back to DEFAULT_THRESHOLD. Use
math.isfinite(threshold) (import math if needed) and check 0.0 <= threshold <=
1.0 after the float() conversion so values like NaN, inf, negative values, or >1
are rejected and DEFAULT_THRESHOLD is used instead.

In `@nodes/src/nodes/face_detection/face_detection.py`:
- Around line 57-63: MODEL_URLS currently uses a mutable "/latest/" URL which
makes downloads non-deterministic; change the URL in MODEL_URLS to an immutable,
versioned artifact (replace '.../latest/...' with the specific release path) and
add a corresponding pinned SHA-256 mapping (e.g., MODEL_SHA256S keyed by
'short'). In the model download/cache routine that calls os.replace (the
function that fetches and atomically swaps the downloaded .tflite), compute the
SHA-256 of the downloaded temp file and verify it matches the pinned hash before
calling os.replace; if verification fails, delete the temp file and raise/log an
error. Apply the same immutable URL + SHA-256 verification for the other model
entry referenced around lines 122-133.
- Line 123: The fallback filename literal in the fname assignment should use
single quotes to match project style: update the line that builds fname (the
expression using digest and os.path.basename(self.model_url) or "model.tflite")
to use 'model.tflite' instead of "model.tflite" so the fallback string uses
single quotes; locate the assignment to fname in the face detection node (where
self.model_url is referenced) and make this small quoting change.

In `@nodes/src/nodes/pose_estimation/IGlobal.py`:
- Around line 54-61: The parsed config values for self.threshold and max_persons
must be validated and clamped to safe ranges before any estimator construction:
after converting threshold and max_persons (keep the existing try/except parsing
logic) enforce threshold = min(max(parsed_threshold, 0.0), 1.0) (fall back to
DEFAULT_THRESHOLD if parsing failed) and enforce max_persons =
int(parsed_max_persons) if parsed_max_persons > 0 and <= SOME_REASONABLE_CAP (or
DEFAULT_MAX_PERSONS if out of range or <= 0); assign the validated values to
self.threshold and self.max_persons so the rest of IGlobal (estimator
construction) uses the clamped, safe values instead of raw config numbers.

In `@nodes/src/nodes/response/IInstance.py`:
- Line 227: Move the inline "import base64" at line 227 into the module-level
imports at the top of nodes/src/nodes/response/IInstance.py (alongside the other
top imports), remove the inline import statement, and ensure any references in
the IInstance class or the AVI_ACTION.END handling continue to use base64; this
keeps imports consistent and prevents repeated imports during the AVI_ACTION.END
code path.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 786d50cc-cd73-4700-95f8-fc25dd3904f1

📥 Commits

Reviewing files that changed from the base of the PR and between e18d357 and fa26c7e.

⛔ Files ignored due to path filters (8)

nodes/src/nodes/background_removal/background-removal.svg is excluded by !**/*.svg
nodes/src/nodes/caption/caption.svg is excluded by !**/*.svg
nodes/src/nodes/depth_estimate/depth-estimate.svg is excluded by !**/*.svg
nodes/src/nodes/detect/detect.svg is excluded by !**/*.svg
nodes/src/nodes/detect_segment/segmentation.svg is excluded by !**/*.svg
nodes/src/nodes/face_detection/face-detection.svg is excluded by !**/*.svg
nodes/src/nodes/pose_estimation/pose-estimation.svg is excluded by !**/*.svg
nodes/src/nodes/video_composer/video_composer.svg is excluded by !**/*.svg

📒 Files selected for processing (52)

docker/Dockerfile.engine
nodes/src/nodes/background_removal/IGlobal.py
nodes/src/nodes/background_removal/IInstance.py
nodes/src/nodes/background_removal/__init__.py
nodes/src/nodes/background_removal/services.json
nodes/src/nodes/caption/IGlobal.py
nodes/src/nodes/caption/IInstance.py
nodes/src/nodes/caption/__init__.py
nodes/src/nodes/caption/services.json
nodes/src/nodes/depth_estimate/IGlobal.py
nodes/src/nodes/depth_estimate/IInstance.py
nodes/src/nodes/depth_estimate/__init__.py
nodes/src/nodes/depth_estimate/services.json
nodes/src/nodes/detect/IGlobal.py
nodes/src/nodes/detect/IInstance.py
nodes/src/nodes/detect/__init__.py
nodes/src/nodes/detect/services.json
nodes/src/nodes/detect_segment/IGlobal.py
nodes/src/nodes/detect_segment/IInstance.py
nodes/src/nodes/detect_segment/__init__.py
nodes/src/nodes/detect_segment/requirements.txt
nodes/src/nodes/detect_segment/services.json
nodes/src/nodes/face_detection/IGlobal.py
nodes/src/nodes/face_detection/IInstance.py
nodes/src/nodes/face_detection/__init__.py
nodes/src/nodes/face_detection/face_detection.py
nodes/src/nodes/face_detection/requirements.txt
nodes/src/nodes/face_detection/services.json
nodes/src/nodes/pose_estimation/IGlobal.py
nodes/src/nodes/pose_estimation/IInstance.py
nodes/src/nodes/pose_estimation/__init__.py
nodes/src/nodes/pose_estimation/services.json
nodes/src/nodes/response/IInstance.py
nodes/src/nodes/video_composer/IGlobal.py
nodes/src/nodes/video_composer/IInstance.py
nodes/src/nodes/video_composer/__init__.py
nodes/src/nodes/video_composer/services.json
nodes/test/conftest.py
nodes/test/face_detection/__init__.py
nodes/test/face_detection/test_face_detection.py
nodes/test/framework/discovery.py
nodes/test/framework/pipeline.py
packages/ai/src/ai/common/embedding.py
packages/ai/src/ai/common/image/dense_resize.py
packages/ai/src/ai/common/models/__init__.py
packages/ai/src/ai/common/models/base.py
packages/ai/src/ai/common/models/transformers/requirements_transformers.txt
packages/ai/src/ai/common/models/vision/__init__.py
packages/ai/src/ai/common/models/vision/background.py
packages/ai/src/ai/common/models/vision/caption.py
packages/ai/src/ai/common/models/vision/depth.py
packages/ai/src/ai/common/models/vision/detection.py

💤 Files with no reviewable changes (18)

packages/ai/src/ai/common/models/transformers/requirements_transformers.txt
packages/ai/src/ai/common/models/vision/init.py
nodes/src/nodes/video_composer/IGlobal.py
nodes/test/framework/pipeline.py
nodes/src/nodes/video_composer/services.json
nodes/src/nodes/video_composer/init.py
packages/ai/src/ai/common/embedding.py
packages/ai/src/ai/common/models/base.py
nodes/test/conftest.py
nodes/test/face_detection/test_face_detection.py
packages/ai/src/ai/common/models/init.py
packages/ai/src/ai/common/models/vision/caption.py
nodes/src/nodes/video_composer/IInstance.py
nodes/test/framework/discovery.py
packages/ai/src/ai/common/models/vision/depth.py
packages/ai/src/ai/common/image/dense_resize.py
packages/ai/src/ai/common/models/vision/background.py
packages/ai/src/ai/common/models/vision/detection.py

github-actions Bot added module:nodes Python pipeline nodes module:client-python Python SDK and MCP client module:ai AI/ML modules module:ui Chat UI and Dropper UI labels Jun 3, 2026

ryan-t-christensen marked this pull request as ready for review June 3, 2026 11:12

ryan-t-christensen requested review from Rod-Christensen, jmaionchi and stepmikhaylov as code owners June 3, 2026 11:12

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

ryan-t-christensen marked this pull request as draft June 3, 2026 18:23

ryan-t-christensen force-pushed the feat/vision branch from c28d9ae to 6268a85 Compare June 3, 2026 19:26

ryan-t-christensen and others added 14 commits June 6, 2026 01:17

feat(vision): add vision node suite and detection models

8cb01d3

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

feat(vision): add background_removal node (BiRefNet)

7b31269

BiRefNet (MIT) with a default 1024px profile and an HR 2048px profile. Emits an RGBA cutout PNG plus alpha-coverage stats for downstream QA/thresholding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

chore(video_composer): add missing icon

6be9708

Icon was omitted from the initial video_composer commit (53c595b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ryan-t-christensen and others added 5 commits June 6, 2026 01:17

feat(depth_estimate): serve via model server

cf8e72c

- AI models: `DepthEstimatorLoader` + `DepthEstimator` facade (local/proxy); node uses the facade, no in-process pipeline - shared `ai.common.utils`: `image_utils` + `cuda_utils` - node-test: default `ROCKETRIDE_APIKEY` so dynamic tests run;

@

bd27820

refactor(depends): add engine/model cache-dir helpers Replace `_get_cache_dir` and hardcoded `./cache/...` paths with `engine_cache_dir`, `model_cache_dir`, and combined/constraints path helpers. Adds unit tests. @

asclearuc force-pushed the feat/vision branch from 23ca8c8 to 313e436 Compare June 7, 2026 11:39

github-actions Bot added the module:server C++ engine and server components label Jun 7, 2026

asclearuc added 6 commits June 9, 2026 21:08

Merge branch 'develop' into feat/vision

fe6b5ca

fix(depends): use engine_cache_dir for excludes.txt path

a1bcbf7

The develop merge (#1196) added excludes.txt handling calling _get_cache_dir(), which feat/vision had renamed to engine_cache_dir() -> Ruff F821. Rename both call sites; fix the stale contract_checks comment.

github-actions Bot added builder docker and removed module:server C++ engine and server components module:nodes Python pipeline nodes module:client-python Python SDK and MCP client module:ai AI/ML modules module:ui Chat UI and Dropper UI labels Jun 10, 2026

asclearuc force-pushed the feat/vision branch from 2f61ac0 to fa26c7e Compare June 11, 2026 02:28

asclearuc marked this pull request as ready for review June 11, 2026 17:25

coderabbitai Bot reviewed Jun 11, 2026

View reviewed changes

asclearuc added 3 commits June 11, 2026 23:26

fix(nodes): address CodeRabbit review findings

58d3581

fix(nodes): address CodeRabbit review findings

af7174b

fix(nodes): address CodeRabbit review findings

70b7690

Conversation

ryan-t-christensen commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Type

Node Walkthroughs

1. detect — Object Detection (refactor)

2. detect_segment — Segmentation (refactor)

3. depth_estimate — Depth Estimation (refactor)

4. caption — Image Captioning (new, replaces describe)

5. face_detection — Face Detection (new)

6. pose_estimation — Pose Estimation (new)

7. background_removal — Background Removal (new)

Cross-Cutting Changes

Testing

Reviewers, please probe

Checklist

Breaking Changes

Notes for Reviewers

Linked Issue

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 3, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ryan-t-christensen commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

1. `detect` — Object Detection (refactor)

2. `detect_segment` — Segmentation (refactor)

3. `depth_estimate` — Depth Estimation (refactor)

4. `caption` — Image Captioning (new, replaces `describe`)

5. `face_detection` — Face Detection (new)

6. `pose_estimation` — Pose Estimation (new)

7. `background_removal` — Background Removal (new)

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading