feat(vision): 7-node suite — RF-DETR, Mask2Former, DA-V2, BlazeFace + more#1081
feat(vision): 7-node suite — RF-DETR, Mask2Former, DA-V2, BlazeFace + more#1081ryan-t-christensen wants to merge 29 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds dense-output helpers, many vision model facades and nodes (background_removal, caption, depth_estimate, detect, detect_segment, face_detection, pose_estimation, video_composer), per-node globals/instances, test-framework shared-lib detection, and client/UI/runtime polish. ChangesVision nodes and shared ML infrastructure
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
No description provided. |
There was a problem hiding this comment.
Actionable comments posted: 17
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/background_removal/background_removal.py`:
- Around line 67-71: The parsed maxEdge value assigned to self._max_edge must be
clamped to allowed bounds to prevent negative or huge values; after parsing the
int from config.get('maxEdge', 1024) (and handling TypeError/ValueError as you
already do), clamp it to the service contract limits (e.g., min_edge = 16 and
max_edge = 4096) before assignment (self._max_edge = max(min_edge,
min(parsed_value, max_edge))); keep the existing fallback to 1024 on parse error
and optionally log when a value is out of bounds for visibility.
In `@nodes/src/nodes/caption/IInstance.py`:
- Around line 37-45: The decode call
ImageProcessor.load_image_from_bytes(self._image_data) is currently outside the
try/except, so malformed frame bytes can raise and crash the END handler; wrap
the image decoding together with the caption inference in the same try/except
that logs a warning (including self._chunk_id and the exception) and sets
caption_text = '' on failure; specifically move or include
ImageProcessor.load_image_from_bytes(...) inside the try block that uses
self.IGlobal.device_lock and self.IGlobal.captioner.caption so both decode and
caption errors are handled the same for AVI_ACTION.END.
In `@nodes/src/nodes/depth_estimate/depth_estimate.py`:
- Around line 48-51: The config value assigned to self._max_edge is currently
accepted as any int; update the initialization in depth_estimate.py to validate
and clamp the parsed integer to a safe range (e.g., min 1 and a sensible upper
bound such as 4096) before storing it so negative/zero/oversized values cannot
reach resize_for_inference; specifically, after parsing in the try/except that
sets self._max_edge, wrap the value with a clamp (e.g., self._max_edge =
max(MIN, min(parsed_value, MAX))) and keep the fallback to the default on
TypeError/ValueError, and ensure any later use in resize_for_inference
references this clamped self._max_edge.
In `@nodes/src/nodes/detect_segment/IInstance.py`:
- Around line 217-227: The END-path frame processing must be guarded so a single
bad frame doesn't crash the node: wrap the ImageProcessor.load_image_from_bytes
+ device-locked call to self.IGlobal.segmenter.segment and the subsequent
self._emit in a try/except that catches any Exception, logs a warning (use
self.logger if present, otherwise self.IGlobal.logger) including the chunk id
and error, then drop the frame by clearing self._image_data and incrementing
self._chunk_id before returning self.preventDefault(); only on success proceed
to emit and then clear/increment as now. Ensure exceptions are not re-raised so
the stream continues.
In `@nodes/src/nodes/detect/detect.py`:
- Around line 65-69: Remove the debug-only warning call that prints internal
config keys in detect.py: delete the warning(...) call (the one constructing
f'detect: __init__ engine={self.engine!r} prompt={self.prompt!r}
config_keys=...') inside the Detect class initializer (or module-level __init__
routine) so production logs no longer expose internal config; if you need
retainable debug visibility, replace it with a logger.debug guarded by a feature
flag or environment check (e.g., use logger.debug in Detect.__init__ behind an
explicit DEBUG flag) rather than emitting a warning.
In `@nodes/src/nodes/detect/services.json`:
- Around line 87-90: The test entry in services.json uses a 60s timeout which is
too short for first-run model downloads/warmups; update the "test" object's
"timeout" value (under the "test" key for the "rfdetr" profile/outputs) to a
larger value (e.g., 300 seconds) to avoid flaky cold-start failures and ensure
the built-in service test has enough time for initial model download and
initialization.
In `@nodes/src/nodes/face_detection/face_detection.py`:
- Around line 128-129: The download call using
urllib.request.urlretrieve(self.model_url, tmp_path) can hang because
urlretrieve has no timeout; replace it with an explicit timed HTTP read (e.g.
call urllib.request.urlopen(self.model_url, timeout=... ) and stream the
response to tmp_path, then os.replace(tmp_path, local_path) as before), and add
proper exception handling around the fetch so a timeout raises and is
logged/propagated. Update the download logic in the FaceDetection class / the
model download method where tmp_path and local_path are used.
In `@nodes/src/nodes/face_detection/IGlobal.py`:
- Around line 117-123: The accessor normalization currently does "if value:" and
then "list(value)", which treats empty containers as missing and splits
dicts/strings into keys/characters causing _looks_biometric(...) to miss
embedding nodes; change the presence check to "if getattr(endpoint, attr, None)
is not None" and normalize safely: if value is a dict or a str/bytes or not an
Iterable, return [value]; otherwise return list(value) (allowing empty lists to
propagate). Update the code around getattr(endpoint, attr, None) and the
normalization logic (referencing the value variable and the _looks_biometric use
sites) and import collections.abc.Iterable as needed.
In `@nodes/src/nodes/face_detection/IInstance.py`:
- Around line 76-86: Wrap the frame decoding/detection/emission logic in the
AVI_ACTION.END branch with a try/except so a single bad frame doesn't abort the
stream: call ImageProcessor.load_image_from_bytes, then inside the with
self.IGlobal.device_lock call self.IGlobal.detector.detect and self._emit within
the same try block, catch Exception, log a warning (include the exception and
the current self._chunk_id) and drop the frame; in both success and failure
paths ensure you clear self._image_data, increment self._chunk_id, and return
self.preventDefault() so the pipeline continues.
In `@nodes/src/nodes/pose_estimation/IInstance.py`:
- Around line 108-114: Wrap the END-frame processing in the AVI_ACTION.END
branch (the sequence calling ImageProcessor.load_image_from_bytes,
self._estimate and self._emit) in a try/except that catches any exception from
decode/inference/emit, logs a warning with the error and context (e.g., chunk
id), clears self._image_data and increments self._chunk_id to drop the bad
frame, then returns self.preventDefault() so the stream continues instead of
crashing; retain normal behavior on success (emit then clear/increment/return).
In `@nodes/src/nodes/pose_estimation/pose_estimation.py`:
- Around line 200-249: The code assumes scores_arr has >= n_kpts columns which
can cause IndexError when accessing scs[k_idx]; modify the logic in the
pose-building loop to clamp the number of score columns and provide a safe
default for missing scores: compute e.g. n_score_cols = min(n_kpts,
scores_arr.shape[1]), slice scs = scores_arr[idx, :n_score_cols], and when
iterating k_idx in range(n_kpts) use the sliced scs value if k_idx <
n_score_cols else a default (0.0 or float('nan')) for the 'score' and for
visibility checks (visible = scs >= self.threshold should use a padded array or
per-index conditional). Update uses of n_kpts (person_scores, keypoints_list,
visible) accordingly while keeping COCO_17_KEYPOINTS, keypoints_arr, scores_arr,
n_kpts, scs, keypoints_list, and self.threshold as the referenced symbols to
locate changes.
In `@nodes/src/nodes/video_composer/IGlobal.py`:
- Around line 41-43: The RuntimeError currently embeds the full connection
config (self.glb.connConfig) which may leak secrets; change the error to avoid
including raw connConfig—use only non-sensitive identifiers (e.g.,
self.glb.logicalType and optionally a sanitized list of keys or a
redacted/hashed version) or omit connConfig entirely in the message. Locate the
block that raises the error in IGlobal (the check for self.config is None that
references self.glb.logicalType and self.glb.connConfig) and replace the
exception text so it no longer prints the full connConfig but still provides
enough context for debugging (e.g., include logicalType and a redacted/keys-only
summary).
In `@nodes/src/nodes/video_composer/IInstance.py`:
- Around line 86-89: Normalize and sanitize numeric config fields before
validation: when reading cfg.get('fps', 1.0) and cfg.get('crf', 23) in the
IInstance initialization, coerce fps to a float and crf to an int using safe
parsing (handling ValueError/TypeError) and fall back to the default values if
parsing fails, then run the existing range checks against these normalized
self._fps and self._crf; ensure any parsing errors produce a clear configuration
error rather than letting TypeError/ValueError bubble up from the later range
checks.
- Around line 47-53: _plog currently performs unguarded filesystem writes which
can raise OSError and crash the node; wrap the file open/write inside a
try/except that catches OSError (or Exception) to prevent propagation, and on
failure fall back to a non-crashing alternative (e.g., write a minimal message
to sys.stderr or simply return) so logging failures don’t break request
handling; update the _plog function and reference the _PLOG variable and
_plog(msg: str) to implement this guarded write behavior.
- Around line 250-257: The sendSSE call can raise transport errors and currently
can abort the whole chunk loop; wrap the self.instance.sendSSE(...) invocation
in a try/except around the block that sends each chunk (the code that references
self.instance.sendSSE, self._filename, chunk_index, total_chunks, etc.), catch
broad exceptions, log the failure (include chunk_index and filename) via the
instance or node logger, optionally perform a small retry, and then continue the
loop so a single SSE failure does not stop streaming the remaining chunks.
In `@packages/client-python/src/rocketride/core/constants.py`:
- Line 84: CONST_WS_PING_TIMEOUT was increased to 600s which delays
dead-connection detection; make this value configurable rather than a hard
constant by adding a configurable parameter that can be passed into the
WebSocket/client constructor or pipeline setup (e.g., ws_ping_timeout kwarg) and
default to CONST_WS_PING_TIMEOUT; update places that currently import or
reference CONST_WS_PING_TIMEOUT to prefer the instance-level setting (falling
back to the constant) and document the new parameter so operators can tune it
per-deployment or per-pipeline and monitor server-side connection pool metrics
for leaks.
In
`@packages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx`:
- Line 94: The helperText assignment in TextareaWidget.tsx uses an unsafe cast
(options?.description as string); change it to a safe runtime-safe expression by
either removing the assertion and relying on TypeScript inference or
wrapping/guarding the value (e.g., check typeof options?.description ===
"string" and use it, otherwise fall back to schema?.description or use
String(options?.description) if coercion is desired). Update the expression used
for helperText in the TextareaWidget component so it no longer uses the "as
string" assertion and instead performs a proper type guard or conversion.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: d9d9511f-c359-41f8-a156-8d22b26ad2f0
⛔ Files ignored due to path filters (9)
nodes/src/nodes/background_removal/background-removal.svgis excluded by!**/*.svgnodes/src/nodes/caption/caption.svgis excluded by!**/*.svgnodes/src/nodes/depth_estimate/depth-estimate.svgis excluded by!**/*.svgnodes/src/nodes/detect/detect.svgis excluded by!**/*.svgnodes/src/nodes/detect_segment/segmentation.svgis excluded by!**/*.svgnodes/src/nodes/face_detection/face-detection.svgis excluded by!**/*.svgnodes/src/nodes/pose_estimation/pose-estimation.svgis excluded by!**/*.svgnodes/src/nodes/tool_python/python.svgis excluded by!**/*.svgnodes/src/nodes/video_composer/video_composer.svgis excluded by!**/*.svg
📒 Files selected for processing (58)
nodes/src/nodes/background_removal/IGlobal.pynodes/src/nodes/background_removal/IInstance.pynodes/src/nodes/background_removal/__init__.pynodes/src/nodes/background_removal/background_removal.pynodes/src/nodes/background_removal/requirements.txtnodes/src/nodes/background_removal/services.jsonnodes/src/nodes/caption/IGlobal.pynodes/src/nodes/caption/IInstance.pynodes/src/nodes/caption/__init__.pynodes/src/nodes/caption/caption.pynodes/src/nodes/caption/requirements.txtnodes/src/nodes/caption/services.jsonnodes/src/nodes/depth_estimate/IGlobal.pynodes/src/nodes/depth_estimate/IInstance.pynodes/src/nodes/depth_estimate/__init__.pynodes/src/nodes/depth_estimate/depth_estimate.pynodes/src/nodes/depth_estimate/requirements.txtnodes/src/nodes/depth_estimate/services.jsonnodes/src/nodes/detect/IGlobal.pynodes/src/nodes/detect/IInstance.pynodes/src/nodes/detect/__init__.pynodes/src/nodes/detect/detect.pynodes/src/nodes/detect/requirements.txtnodes/src/nodes/detect/services.jsonnodes/src/nodes/detect_segment/IGlobal.pynodes/src/nodes/detect_segment/IInstance.pynodes/src/nodes/detect_segment/__init__.pynodes/src/nodes/detect_segment/detect_segment.pynodes/src/nodes/detect_segment/requirements.txtnodes/src/nodes/detect_segment/services.jsonnodes/src/nodes/face_detection/IGlobal.pynodes/src/nodes/face_detection/IInstance.pynodes/src/nodes/face_detection/__init__.pynodes/src/nodes/face_detection/face_detection.pynodes/src/nodes/face_detection/requirements.txtnodes/src/nodes/face_detection/services.jsonnodes/src/nodes/pose_estimation/IGlobal.pynodes/src/nodes/pose_estimation/IInstance.pynodes/src/nodes/pose_estimation/__init__.pynodes/src/nodes/pose_estimation/pose_estimation.pynodes/src/nodes/pose_estimation/requirements.txtnodes/src/nodes/pose_estimation/services.jsonnodes/src/nodes/response/IInstance.pynodes/src/nodes/video_composer/IGlobal.pynodes/src/nodes/video_composer/IInstance.pynodes/src/nodes/video_composer/__init__.pynodes/src/nodes/video_composer/services.jsonpackage.jsonpackages/ai/src/ai/common/embedding.pypackages/ai/src/ai/common/image/dense_resize.pypackages/ai/src/ai/common/models/__init__.pypackages/ai/src/ai/common/models/detection/__init__.pypackages/ai/src/ai/common/models/detection/detection.pypackages/ai/src/ai/common/models/detection/requirements_detection.txtpackages/ai/src/ai/common/models/transformers/requirements_transformers.txtpackages/client-python/src/rocketride/client.pypackages/client-python/src/rocketride/core/constants.pypackages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx
| # If no pong response is received within this period after a ping, | ||
| # the connection is considered dead and will be closed | ||
| CONST_WS_PING_TIMEOUT = 60 | ||
| CONST_WS_PING_TIMEOUT = 600 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
10-minute ping timeout trades faster dead-connection detection for long-running inference support.
The 10x increase from 60s to 600s accommodates the heavy vision processing added in this PR (depth estimation, segmentation, pose, etc.), which may legitimately take minutes per frame on CPU or with large images. However, truly dead connections will now linger for up to 10 minutes before being detected and closed.
Consider monitoring server-side connection pool metrics after deployment to ensure abandoned connections don't accumulate. If resource leaks become an issue, you might want to expose this as a configurable parameter (e.g., constructor kwarg or per-pipeline timeout override) rather than a single global constant.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/client-python/src/rocketride/core/constants.py` at line 84,
CONST_WS_PING_TIMEOUT was increased to 600s which delays dead-connection
detection; make this value configurable rather than a hard constant by adding a
configurable parameter that can be passed into the WebSocket/client constructor
or pipeline setup (e.g., ws_ping_timeout kwarg) and default to
CONST_WS_PING_TIMEOUT; update places that currently import or reference
CONST_WS_PING_TIMEOUT to prefer the instance-level setting (falling back to the
constant) and document the new parameter so operators can tune it per-deployment
or per-pipeline and monitor server-side connection pool metrics for leaks.
- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (2)
nodes/src/nodes/video_composer/IInstance.py (1)
259-275:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard the final
video_completeSSE as well.The chunk sends are now best-effort, but Line 274 still does a bare
sendSSE(...). A disconnected client can still raise there and failclose()after the MP4 has already been written downstream.Suggested patch
- self.instance.sendSSE('video_complete', filename=self._filename) + try: + self.instance.sendSSE('video_complete', filename=self._filename) + except Exception as e: + _log(f'_output_video: sendSSE failed for video_complete filename={self._filename}: {e}') _plog(f'output_video: done -- sent {total_chunks} SSE chunks filename={self._filename!r}')🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/video_composer/IInstance.py` around lines 259 - 275, Wrap the final self.instance.sendSSE('video_complete', filename=self._filename) in the same defensive try/except used for chunk sends so a disconnected client doesn't raise and prevent cleanup; specifically, after calling self.instance.writeVideo(AVI_ACTION.END, 'video/mp4', b'') call sendSSE inside a try block, catch Exception as e and log the failure (use _log or _plog with context including filename and the exception) but do not re-raise — keep writeVideo and the final _plog that reports sent chunks unchanged so the MP4 close path always completes even if sendSSE fails.nodes/src/nodes/pose_estimation/pose_estimation.py (1)
205-214:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDon’t truncate the emitted COCO-17 keypoint list when score columns are short.
Clamping
n_kptsagainstscores_arr.shape[1]avoids theIndexError, but it also makes a(N, 17, 2)keypoint result emit fewer than 17 keypoints whenever the score tensor is shorter. That breaks the node’s documented COCO-17 output contract instead of degrading scores only. Keepn_kptsbased on the keypoint tensor / label list, slice the available score columns separately, and default any missing scores to0.0.Suggested patch
- n_persons = keypoints_arr.shape[0] - # Clamp against score columns and the COCO label count so neither - # scs[k_idx] nor COCO_17_KEYPOINTS[k_idx] can raise IndexError. - n_kpts = min(keypoints_arr.shape[1], scores_arr.shape[1], len(COCO_17_KEYPOINTS)) - if n_kpts == 0: + n_persons = keypoints_arr.shape[0] + n_kpts = min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS)) + n_score_cols = min(scores_arr.shape[1], n_kpts) + if n_kpts == 0: return [] - if keypoints_arr.shape[1] != len(COCO_17_KEYPOINTS): + if n_kpts != len(COCO_17_KEYPOINTS) or n_score_cols != n_kpts: warning( - f'pose_estimation: unexpected keypoint count {keypoints_arr.shape[1]}, expected {len(COCO_17_KEYPOINTS)}' + 'pose_estimation: unexpected keypoint/score count ' + f'({keypoints_arr.shape[1]} keypoints, {scores_arr.shape[1]} scores), ' + f'expected {len(COCO_17_KEYPOINTS)}' ) @@ - person_scores = scores_arr[:, :n_kpts].mean(axis=1) + if n_score_cols > 0: + person_scores = scores_arr[:, :n_score_cols].mean(axis=1) + else: + person_scores = np.zeros(n_persons, dtype=float) @@ - scs = scores_arr[idx, :n_kpts] + scs = np.zeros(n_kpts, dtype=float) + if n_score_cols > 0: + scs[:n_score_cols] = scores_arr[idx, :n_score_cols]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@nodes/src/nodes/pose_estimation/pose_estimation.py` around lines 205 - 214, The code currently sets n_kpts using scores_arr and truncates keypoints output; instead compute n_kpts as min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS)) so the emitted keypoint array preserves up to COCO_17_KEYPOINTS, then separately slice scores_arr (e.g., available_scores = scores_arr[:, :n_score_cols]) and when mapping scores into the COCO-17 slots fill any missing score columns with 0.0 to avoid IndexError; update references around n_kpts, keypoints_arr, scores_arr and COCO_17_KEYPOINTS so keypoints are not dropped while scores are safely bounded/padded.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@nodes/src/nodes/pose_estimation/pose_estimation.py`:
- Around line 205-214: The code currently sets n_kpts using scores_arr and
truncates keypoints output; instead compute n_kpts as
min(keypoints_arr.shape[1], len(COCO_17_KEYPOINTS)) so the emitted keypoint
array preserves up to COCO_17_KEYPOINTS, then separately slice scores_arr (e.g.,
available_scores = scores_arr[:, :n_score_cols]) and when mapping scores into
the COCO-17 slots fill any missing score columns with 0.0 to avoid IndexError;
update references around n_kpts, keypoints_arr, scores_arr and COCO_17_KEYPOINTS
so keypoints are not dropped while scores are safely bounded/padded.
In `@nodes/src/nodes/video_composer/IInstance.py`:
- Around line 259-275: Wrap the final self.instance.sendSSE('video_complete',
filename=self._filename) in the same defensive try/except used for chunk sends
so a disconnected client doesn't raise and prevent cleanup; specifically, after
calling self.instance.writeVideo(AVI_ACTION.END, 'video/mp4', b'') call sendSSE
inside a try block, catch Exception as e and log the failure (use _log or _plog
with context including filename and the exception) but do not re-raise — keep
writeVideo and the final _plog that reports sent chunks unchanged so the MP4
close path always completes even if sendSSE fails.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0338639a-1f6e-4f38-8f93-8f12253ab6c8
📒 Files selected for processing (14)
nodes/src/nodes/background_removal/background_removal.pynodes/src/nodes/caption/IInstance.pynodes/src/nodes/depth_estimate/depth_estimate.pynodes/src/nodes/detect/detect.pynodes/src/nodes/detect/services.jsonnodes/src/nodes/detect_segment/IInstance.pynodes/src/nodes/face_detection/IGlobal.pynodes/src/nodes/face_detection/IInstance.pynodes/src/nodes/face_detection/face_detection.pynodes/src/nodes/pose_estimation/IInstance.pynodes/src/nodes/pose_estimation/pose_estimation.pynodes/src/nodes/video_composer/IGlobal.pynodes/src/nodes/video_composer/IInstance.pypackages/shared-ui/src/components/canvas/components/rjsf-widgets/textarea-widget/TextareaWidget.tsx
💤 Files with no reviewable changes (1)
- nodes/src/nodes/detect/detect.py
- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
c28d9ae to
6268a85
Compare
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Standalone helpers (resize_for_inference, restore_dense_output, restore_rle_mask, default_max_edge) that bound large frames to a max edge before model inference and lift dense outputs back to the source resolution. Prereq for depth_estimate, detect_segment, and background_removal nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ormer) Loaders for RF-DETR (Apache-2.0), MM-Grounding-DINO, and Mask2Former instance/semantic via HuggingFace transformers, replacing the prior SAM3-only DetectionLoader. Prereq for the detect and detect_segment nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the legacy YOLO-World backend with RF-DETR (default, COCO 80-class, Apache-2.0) and MM-Grounding-DINO for open-vocabulary prompt-driven detection. Collapses I/O to a single image lane — drops the video lane and Doc emission since dedicated nodes now cover those flows. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the gated SAM3 backend (required CUDA 12.6+) in favor of Mask2Former with instance and semantic modes. Single-frame only — video lane removed — and uses dense_resize to bound peak memory on large inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Simplify the node to a single image lane on Depth-Anything V2 Small and add a max_edge param so inference is bounded by resolution rather than the raw frame size. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the multi-task describe node (caption/detect/grounding/OCR) and replace with a caption-only Florence-2 Base node exposing the three granularity levels. Detection, grounding, and OCR are now owned by their dedicated nodes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BlazeFace short-range via MediaPipe (Apache-2.0), boxes only — no embeddings emitted. Privacy guards baked in: chains_to_embedding=false in services.json plus a runtime block on any embedding_* downstream node so face crops cannot flow into identification pipelines. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ONNX-backed RTMPose with tiny/medium/large profiles, returning the 17 COCO keypoints per person. Exposes max_persons and detection threshold as configurable fields. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
BiRefNet (MIT) with a default 1024px profile and an HR 2048px profile. Emits an RGBA cutout PNG plus alpha-coverage stats for downstream QA/thresholding. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TextareaWidget now pipes the services.json description field through to MUI's helperText, and pins InputLabelProps.shrink=true so the field title always floats above the outlined border instead of clipping through it on empty/multiline fields. Applies uniformly to all ~34 textareas across the suite. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Newer transformers releases regress the vision models used in this suite (RF-DETR, Mask2Former, Florence-2, Depth-Anything V2). Pin in the shared requirements_transformers.txt, document the why inline in embedding.py, and bump the package patch (3.2.0 -> 3.2.1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Icon was omitted from the initial video_composer commit (53c595b). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Clamp maxEdge to 256-4096 in background_removal and depth_estimate - Drop bad frames with warning on END path (caption, detect_segment, face_detection, pose_estimation) instead of crashing the stream - Clamp pose keypoint count against score columns to avoid IndexError - Add download timeout to face_detection model fetch (urlopen vs urlretrieve) - Fix face_detection downstream normalization for dict/str values - Coerce video_composer fps/crf config types; guard _plog and sendSSE I/O - Drop connConfig from video_composer error text to avoid leaking PII - Remove debug config-key logging from detect - Bump detect service test timeout 60s -> 180s for cold-start pulls - Replace unsafe 'as string' assertion with typeof guard in TextareaWidget Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The chains_to_embedding flag was inert metadata (read by nothing in the framework) and the runtime _check_no_biometric_downstream guard never fired: it probed downstream-node accessors that the rocketlib endpoint never exposes, so it always fell through to a warning and allowed the pipeline. Remove the dead guard code, the chains_to_embedding flag, and the biometric/BIPA-GDPR framing so the node behaves like every other vision node — boxes and alignment landmarks, no special downstream restriction. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ce test - remove `_pick_device()` (torch import) and the unused `self.device` field; mediapipe runs on CPU and the value was never read at inference - test: detect a real face in a public-domain portrait (`testdata/images/einstein.jpg`, ~0.80 confidence, deterministic) instead of feeding a text image (`ocr_test_text.png`) with no faces
- AI models: `DepthEstimatorLoader` + `DepthEstimator` facade (local/proxy); node uses the facade, no in-process pipeline - shared `ai.common.utils`: `image_utils` + `cuda_utils` - node-test: default `ROCKETRIDE_APIKEY` so dynamic tests run;
Move model logic into `ai.common.models.vision` loaders + facades (`DepthEstimator`/`Detector`/`Segmenter`); nodes keep config parsing and rendering only. Centralize backend/mode/threshold/max_edge constants in the loaders, share a base+extras requirements split via `_REQUIREMENTS_FILE`, and cache weights under `model_cache_dir`. Add detection/segmentation unit tests, services.json groups, and skip_nodes entries.
Move caption (Florence-2) and background_removal (BiRefNet) into `ai.common.models.vision` loaders + facades; nodes keep config + rendering only. Run BiRefNet in float32 (fixes the bf16 `deformable_im2col` crash). Add unit tests, services.json groups, and skip_nodes entries.
- per-frame try/except in depth/detect/background_removal IInstance (drop frame + warn), matching caption/detect_segment - restore caption local-mode inference watchdog (60s) dropped in conversion - pin model revisions: thread revision through detect/segment backends and pin SHAs in depth/detect/detect_segment/caption/background_removal profiles - fulltest covers every profile (test = default only); add face_detection fulltest; add per-group test `config` so mmgdino supplies its prompt instead of a baked profile default
Move rtmlib pose into `ai.common.models.vision.pose` — torch-free PoseEstimatorLoader + PoseEstimator (device via onnxruntime providers). Node keeps config + skeleton rendering only; `mode` is identity, threshold/max_persons per-request. services.json test=default, fulltest=all 3 profiles; pose moved to the heavy skip_nodes group. Add test_pose.py. Also fix caption fp16 crash on CUDA: cast pixel_values to the model dtype so Florence-2's conv doesn't get float input vs half weights.
The develop merge (#1196) added excludes.txt handling calling _get_cache_dir(), which feat/vision had renamed to engine_cache_dir() -> Ruff F821. Rename both call sites; fix the stale contract_checks comment.
Importing mediapipe pulls in matplotlib.pyplot, which aborts the embedded engine during global init (seen as a misleading "Task has already completed"). Fixing that exposes mediapipe's libGLESv2.so.2 dependency, absent on headless hosts. - stub matplotlib.pyplot before importing mediapipe; re-raise the libGLESv2 load failure with an install hint - add OS-keyed `requiresLibs` test config: skip with a reason when a system lib is missing instead of hard-failing - install libgles2 in compiler-unix.sh and Dockerfile.engine - add unit tests; enable pytest -ra for visible skip reasons
…check CI surfaced two follow-ups to the initial fix: - MediaPipe needs both libGLESv2.so.2 and libEGL.so.1; only the former was handled, so the node test failed on CI once libGLESv2 was installed. Add libegl1 to compiler-unix.sh and Dockerfile.engine, and to requiresLibs. - check-externals imports mediapipe directly to verify its contract, which pulls matplotlib and aborts the engine's FreeType (the node's pyplot stub doesn't apply there). Mark the mediapipe imports contract-check: ignore.
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@nodes/src/nodes/depth_estimate/IGlobal.py`:
- Around line 48-60: The code always passes DEFAULT_MODEL to DepthEstimator,
ignoring any per-profile model configured in node_cfg; change the DepthEstimator
instantiation in IGlobal to use the configured model id (e.g. model_id =
(node_cfg.get('model') or DEFAULT_MODEL).strip() or DEFAULT_MODEL) instead of
DEFAULT_MODEL so profile settings take effect, and keep using the existing
revision variable when constructing DepthEstimator (DepthEstimator(...,
device=None, revision=revision)); ensure you handle missing or non-string values
with the same fallback logic used for max_edge.
In `@nodes/src/nodes/depth_estimate/services.json`:
- Line 10: Update the "description" field in services.json to correct the output
contract: state that depth statistics (min, max, mean) are emitted on the "text"
lane as JSON rather than being attached to the output document's metadata;
locate and edit the "description" property in
nodes/src/nodes/depth_estimate/services.json and replace the sentence
referencing metadata with one that clearly says the stats are returned as a JSON
string on the "text" lane (mention "colorized depth map" and supported runtimes
can remain unchanged).
In `@nodes/src/nodes/detect_segment/IGlobal.py`:
- Around line 63-69: After parsing threshold and max_edge in IGlobal (variables
threshold and max_edge, using DEFAULT_THRESHOLD and DEFAULT_MAX_EDGE), clamp
their values to supported bounds: force threshold into a valid range (e.g.,
between 0.0 and 1.0) and force max_edge to a sensible positive range (e.g., at
least 1 and at most DEFAULT_MAX_EDGE) using min/max or explicit checks; replace
the current blind casts with try/except followed by clamping so downstream
resize/inference cannot receive invalid values.
- Around line 80-83: In endGlobal(), calling self.segmenter.disconnect() can
raise and abort shutdown; make teardown best-effort by wrapping the
disconnect/close call for self.segmenter in a try/except (catch broad Exception)
so exceptions are logged/ignored and do not stop teardown, then always set
self.segmenter = None and self.device_lock = None (use finally or ensure
post-except cleanup) similar to other vision nodes' guarded close/disconnect
patterns; refer to the endGlobal method and
self.segmenter.disconnect()/self.segmenter to locate the change.
In `@nodes/src/nodes/detect_segment/IInstance.py`:
- Around line 138-159: Validate and bound-check the semantic dimensions and the
decoded class_map before any decompression or large allocation: ensure size, h
and w derived from semantic.get('size') are integers within safe limits
(positive and below a MAX_PIXELS or MAX_DIM constant) and fall back to
image.height/width only after validation; before calling
zlib.decompress/base64.b64decode check the class_map_b64 length and
estimate/uncompress size to avoid huge outputs and only proceed if the expected
raw.size equals h*w and h*w <= MAX_PIXELS; likewise validate the result of
_decode_rle_to_mask (and the presence/shape of semantic['semantic_map']) before
using it and before allocating color_layer so you never reshape or allocate
arrays using untrusted h/w values.
In `@nodes/src/nodes/detect/IGlobal.py`:
- Around line 57-60: After parsing threshold from config.get into threshold (and
catching TypeError/ValueError), validate that the parsed value is finite and
within [0,1]; if not, fall back to DEFAULT_THRESHOLD. Use
math.isfinite(threshold) (import math if needed) and check 0.0 <= threshold <=
1.0 after the float() conversion so values like NaN, inf, negative values, or >1
are rejected and DEFAULT_THRESHOLD is used instead.
In `@nodes/src/nodes/face_detection/face_detection.py`:
- Around line 57-63: MODEL_URLS currently uses a mutable "/latest/" URL which
makes downloads non-deterministic; change the URL in MODEL_URLS to an immutable,
versioned artifact (replace '.../latest/...' with the specific release path) and
add a corresponding pinned SHA-256 mapping (e.g., MODEL_SHA256S keyed by
'short'). In the model download/cache routine that calls os.replace (the
function that fetches and atomically swaps the downloaded .tflite), compute the
SHA-256 of the downloaded temp file and verify it matches the pinned hash before
calling os.replace; if verification fails, delete the temp file and raise/log an
error. Apply the same immutable URL + SHA-256 verification for the other model
entry referenced around lines 122-133.
- Line 123: The fallback filename literal in the fname assignment should use
single quotes to match project style: update the line that builds fname (the
expression using digest and os.path.basename(self.model_url) or "model.tflite")
to use 'model.tflite' instead of "model.tflite" so the fallback string uses
single quotes; locate the assignment to fname in the face detection node (where
self.model_url is referenced) and make this small quoting change.
In `@nodes/src/nodes/pose_estimation/IGlobal.py`:
- Around line 54-61: The parsed config values for self.threshold and max_persons
must be validated and clamped to safe ranges before any estimator construction:
after converting threshold and max_persons (keep the existing try/except parsing
logic) enforce threshold = min(max(parsed_threshold, 0.0), 1.0) (fall back to
DEFAULT_THRESHOLD if parsing failed) and enforce max_persons =
int(parsed_max_persons) if parsed_max_persons > 0 and <= SOME_REASONABLE_CAP (or
DEFAULT_MAX_PERSONS if out of range or <= 0); assign the validated values to
self.threshold and self.max_persons so the rest of IGlobal (estimator
construction) uses the clamped, safe values instead of raw config numbers.
In `@nodes/src/nodes/response/IInstance.py`:
- Line 227: Move the inline "import base64" at line 227 into the module-level
imports at the top of nodes/src/nodes/response/IInstance.py (alongside the other
top imports), remove the inline import statement, and ensure any references in
the IInstance class or the AVI_ACTION.END handling continue to use base64; this
keeps imports consistent and prevents repeated imports during the AVI_ACTION.END
code path.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 786d50cc-cd73-4700-95f8-fc25dd3904f1
⛔ Files ignored due to path filters (8)
nodes/src/nodes/background_removal/background-removal.svgis excluded by!**/*.svgnodes/src/nodes/caption/caption.svgis excluded by!**/*.svgnodes/src/nodes/depth_estimate/depth-estimate.svgis excluded by!**/*.svgnodes/src/nodes/detect/detect.svgis excluded by!**/*.svgnodes/src/nodes/detect_segment/segmentation.svgis excluded by!**/*.svgnodes/src/nodes/face_detection/face-detection.svgis excluded by!**/*.svgnodes/src/nodes/pose_estimation/pose-estimation.svgis excluded by!**/*.svgnodes/src/nodes/video_composer/video_composer.svgis excluded by!**/*.svg
📒 Files selected for processing (52)
docker/Dockerfile.enginenodes/src/nodes/background_removal/IGlobal.pynodes/src/nodes/background_removal/IInstance.pynodes/src/nodes/background_removal/__init__.pynodes/src/nodes/background_removal/services.jsonnodes/src/nodes/caption/IGlobal.pynodes/src/nodes/caption/IInstance.pynodes/src/nodes/caption/__init__.pynodes/src/nodes/caption/services.jsonnodes/src/nodes/depth_estimate/IGlobal.pynodes/src/nodes/depth_estimate/IInstance.pynodes/src/nodes/depth_estimate/__init__.pynodes/src/nodes/depth_estimate/services.jsonnodes/src/nodes/detect/IGlobal.pynodes/src/nodes/detect/IInstance.pynodes/src/nodes/detect/__init__.pynodes/src/nodes/detect/services.jsonnodes/src/nodes/detect_segment/IGlobal.pynodes/src/nodes/detect_segment/IInstance.pynodes/src/nodes/detect_segment/__init__.pynodes/src/nodes/detect_segment/requirements.txtnodes/src/nodes/detect_segment/services.jsonnodes/src/nodes/face_detection/IGlobal.pynodes/src/nodes/face_detection/IInstance.pynodes/src/nodes/face_detection/__init__.pynodes/src/nodes/face_detection/face_detection.pynodes/src/nodes/face_detection/requirements.txtnodes/src/nodes/face_detection/services.jsonnodes/src/nodes/pose_estimation/IGlobal.pynodes/src/nodes/pose_estimation/IInstance.pynodes/src/nodes/pose_estimation/__init__.pynodes/src/nodes/pose_estimation/services.jsonnodes/src/nodes/response/IInstance.pynodes/src/nodes/video_composer/IGlobal.pynodes/src/nodes/video_composer/IInstance.pynodes/src/nodes/video_composer/__init__.pynodes/src/nodes/video_composer/services.jsonnodes/test/conftest.pynodes/test/face_detection/__init__.pynodes/test/face_detection/test_face_detection.pynodes/test/framework/discovery.pynodes/test/framework/pipeline.pypackages/ai/src/ai/common/embedding.pypackages/ai/src/ai/common/image/dense_resize.pypackages/ai/src/ai/common/models/__init__.pypackages/ai/src/ai/common/models/base.pypackages/ai/src/ai/common/models/transformers/requirements_transformers.txtpackages/ai/src/ai/common/models/vision/__init__.pypackages/ai/src/ai/common/models/vision/background.pypackages/ai/src/ai/common/models/vision/caption.pypackages/ai/src/ai/common/models/vision/depth.pypackages/ai/src/ai/common/models/vision/detection.py
💤 Files with no reviewable changes (18)
- packages/ai/src/ai/common/models/transformers/requirements_transformers.txt
- packages/ai/src/ai/common/models/vision/init.py
- nodes/src/nodes/video_composer/IGlobal.py
- nodes/test/framework/pipeline.py
- nodes/src/nodes/video_composer/services.json
- nodes/src/nodes/video_composer/init.py
- packages/ai/src/ai/common/embedding.py
- packages/ai/src/ai/common/models/base.py
- nodes/test/conftest.py
- nodes/test/face_detection/test_face_detection.py
- packages/ai/src/ai/common/models/init.py
- packages/ai/src/ai/common/models/vision/caption.py
- nodes/src/nodes/video_composer/IInstance.py
- nodes/test/framework/discovery.py
- packages/ai/src/ai/common/models/vision/depth.py
- packages/ai/src/ai/common/image/dense_resize.py
- packages/ai/src/ai/common/models/vision/background.py
- packages/ai/src/ai/common/models/vision/detection.py
Summary
detect,detect_segment,depth_estimate) and 4 new (caption,face_detection,pose_estimation,background_removal).dense_resizeutil + detection loader suite; pinstransformers==4.53.3; ships a universal textareahelperTextUX fix.Type
feature (vision suite v1) + refactor + chore
Node Walkthroughs
All walkthroughs below (except
face_detection) run against the same input image so outputs are directly comparable.1.
detect— Object Detection (refactor)Run closed-set or open-vocabulary object detection over any input image. Default backend is RF-DETR (Apache-2.0) on COCO-80; flip to MM-Grounding-DINO (Apache-2.0 / BSD-3) when you need to detect arbitrary classes from a free-text prompt. Emits an annotated image plus JSON
[{label, score, box, centroid}]on the text lane.When to use: you need bounding boxes for known COCO classes (RF-DETR) or for novel categories described in natural language (Grounding-DINO).
2.
detect_segment— Segmentation (refactor)Produce pixel-accurate masks via Mask2Former (MIT) in two profiles: instance (COCO 80-class, default) or semantic (ADE20K 150-class). Image lane returns per-instance / per-class colored mask overlays; text lane returns RLE-encoded masks for compact transport. Replaces the SAM3 backend (gated, required CUDA 12.6+) — single-frame only, uses
dense_resizeto bound peak memory.When to use: you need masks (not just boxes) for compositing, measurement, or mask-conditioned downstream nodes.
3.
depth_estimate— Depth Estimation (refactor)Monocular relative depth via Depth-Anything V2 Small (Apache-2.0).
max_edgebounds inference resolution to trade fidelity for latency on large inputs. Image lane emits a colorized depth map; text lane emits{min, max, mean}stats. Simplified to image-only I/O; video lane and Doc emission removed (handled by upstreamframe_grabber).When to use: you need a depth cue for parallax, foreground/background separation, or as a conditioning signal for other nodes.
4.
caption— Image Captioning (new, replacesdescribe)Generate a natural-language description with Florence-2 Base (MIT) at one of three granularities:
caption,detailed_caption,more_detailed_caption. Replaces the prior multi-taskdescribenode — specialized to captioning only, keeping the surface area tight. Emits a caption string on the text lane.When to use: you need a textual summary of an image to drive search, prompts, alt-text, or downstream LLM reasoning.
(Florence-2 caption rendered as text)
5.
face_detection— Face Detection (new)Detect faces with MediaPipe BlazeFace (Apache-2.0), returning axis-aligned bounding boxes plus, optionally, six coarse alignment-grade keypoints per face (eyes, nose tip, mouth center, ear tragions). Image lane emits an annotated frame; text lane emits JSON
[{label: 'face', score, box, centroid, landmarks?}]. Keypoints are toggleable viaemit_landmarks.When to use: you need face localization or alignment landmarks for cropping, blurring, or framing.
6.
pose_estimation— Pose Estimation (new)Top-down 2D human pose estimation via RTMPose through
rtmlib(Apache-2.0), withtiny/medium(default) /largeprofiles. Each detected person yields 17 COCO keypoints;max_personscaps the per-image work. Image lane emits a skeleton overlay; text lane emits[{box, keypoints}].When to use: you need body keypoints for gesture, motion analysis, sports, or pose-conditioned generation.
7.
background_removal— Background Removal (new)Cut subjects out of their background with BiRefNet (MIT) in either the default 1024px profile or the HR 2048px profile for fine edges and hair. Image lane returns an RGBA PNG with a soft alpha matte (straight, not premultiplied); text lane returns
{mean_alpha, alpha_coverage_pct}so downstream logic can gate on how much subject was actually found.When to use: you need a transparent subject for compositing, product shots, thumbnails, or as a mask source for other nodes.
Cross-Cutting Changes
dense_resizeutil (packages/ai/.../image/dense_resize.py) — standalone helpers (resize_for_inference,restore_dense_output,restore_rle_mask) for memory-bounded dense inference. Consumed bydepth_estimate,detect_segment,background_removal. Keep in mind for future dense-prediction nodes.packages/ai/.../models/detection/detection.py) —RFDetrLoader,MmGDinoLoader,Mask2FormerInstanceLoader,Mask2FormerSemanticLoader. Five unused legacy loaders (Florence2, OWLv2, MobileSAM, SAM2, SAM3 grounded) removed in the same pass — loader registry is now 1:1 with shipped models.transformers==4.53.3pin — load-bearing. Newer versions regress on Mask2Former and Depth-Anything-V2. Do not bump without re-validating every vision node (I'm still reviewing this, not a fan)helperText+ sticky label — universal UX fix. Affects all ~34 textareas across the suite (not just vision); pipesservices.jsondescriptionfield into MUI'shelperTextand pinsInputLabelProps.shrink=trueso labels always float above the outlined border.tool_python/python.svgminified (265 → 24 lines, no visual change);video_composer/video_composer.svgadded (was missing from prior commit).Testing
/Users/ryan/Desktop/skyrim/test.pipeduring development; each loaded, processed frames, and emitted expected outputs.ruff,gitleaks) pass on every commit in this branch../builder testpasses — not run; reviewers please confirmReviewers, please probe
max_edgememory bounds — push a 4K+ input and a very small input throughdepth_estimate,detect_segment, andbackground_removalto confirm resize/clamp holds at both ends.face_detectionoutput shape — run an image with at least one clear face; confirm the text lane emits[{label: 'face', score, box, centroid}]and the image lane returns an annotated frame. Toggle Emit 6 alignment keypoints off and confirmlandmarksis omitted, on and confirm six named points per face. Frames with no detectable face should pass through without error.Checklist
Breaking Changes
describenode removed. Pipelines referencingdescribemust migrate to eithercaption(captioning only) or the appropriate per-task node (detect,detect_segment, etc.). No automatic migration path — node IDs in saved graphs will fail to resolve and need manual replacement.Notes for Reviewers
detect,detect_segment,depth_estimatewere restructured around the new loaders but produce the same outputs.dense_resize, loaders) → refactors → new nodes → chores. Loaders + util are tiny; review them first to establish the contract used by everything else.describedeletion is intentional. Per-task work has its own dedicated node now;captionis the captioning-only successor.face_detection"privacy guard" removed. An earlier draft of this PR described a structural guard (chains_to_embedding: falseplus a runtime block on downstreamembedding_*nodes). That mechanism was non-functional — nothing in the framework read the flag, and the runtime check probed downstream-node accessors that rocketlib's endpoint never exposes, so it never fired. It's been removed rather than left as dead code implying a guarantee we don't provide;face_detectionnow behaves like every other vision node.Linked Issue
N/A — direct feature work; no tracking issue exists for this branch.
Summary by CodeRabbit
New Features
Improvements
Tests