Conversation
There was a problem hiding this comment.
Pull request overview
This PR extends the video segmentation pipeline by adding a new Meshroom node (VideoSegmentationSam3Boxes) that generates masks from tracked bounding boxes stored in a JSON file, and aligns parts of the existing SAM3 text-based video segmentation to use absolute frame IDs and updated ID mapping.
Changes:
- Added
VideoSegmentationSam3Boxesnode to segment video frames using per-frame bounding boxes (with multi-resolution inputs and mask inversion support). - Added
segmentationRDS/bboxUtils.pyto parse/merge/expand boxes and split them into consecutive-frame chunks. - Updated SAM3 utilities and text node to use the new
mapIdssignature and to key box dictionaries by absolute frame IDs.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 10 comments.
| File | Description |
|---|---|
segmentationRDS/sam3Utils.py |
Changed mapIds signature and scaled ROI using mask dimensions instead of passed-in w/h. |
segmentationRDS/bboxUtils.py |
New helper module for reading/merging/expanding boxes and creating tracking chunks. |
meshroom/imageSegmentation/VideoSegmentationSam3Text.py |
Updated mapIds calls and changed box dictionary indexing to absolute frame IDs; adjusted mask filling values. |
meshroom/imageSegmentation/VideoSegmentationSam3Boxes.py |
New node implementation for box-driven video segmentation with SAM3 video predictor and multi-resolution crop handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| def merge_boxes(box1: list, box2: list, iou_threshold: float = 0.5) -> tuple[list, str]: | ||
| """ | ||
| Merge 2 boxes xyxy by taking the bounding boxe, if their IoU is higher than the threshold. |
There was a problem hiding this comment.
| Merge 2 boxes xyxy by taking the bounding boxe, if their IoU is higher than the threshold. | |
| Merge 2 boxes xyxy by taking the bounding box, if their IoU is higher than the threshold. |
| ] | ||
| return merged, f"bounding (IoU={iou:.2f})" | ||
| else: | ||
| return box1, f"forward (IoU={iou:.2f} < seuil={iou_threshold})" |
There was a problem hiding this comment.
| return box1, f"forward (IoU={iou:.2f} < seuil={iou_threshold})" | |
| return box1, f"forward (IoU={iou:.2f} < threshold={iou_threshold})" |
|
|
||
| expanded_display = [int(new_x1), int(new_y1), int(new_x2), int(new_y2)] | ||
|
|
||
| # 3. Back conversion to source space |
There was a problem hiding this comment.
| # 3. Back conversion to source space | |
| # Back conversion to source space |
| ) -> dict: | ||
| """ | ||
| Extract bounding boxes per object and organize them in chunck of consecutive frames. | ||
| Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applicated. |
There was a problem hiding this comment.
| Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applicated. | |
| Coordinates in the json file are supposed to be in the original source space, with the pixel aspect ratio not applied. |
| import json | ||
| from dataclasses import dataclass, field | ||
|
|
||
| THRESHOLDS = [252, 504, 1008] |
There was a problem hiding this comment.
| THRESHOLDS = [252, 504, 1008] | |
| SIZE_THRESHOLDS = [252, 504, 1008] |
| if target_size < 504 and not x4_ok: | ||
| target_size = 504 | ||
| if target_size < 1008 and not x2_ok: | ||
| target_size = 1008 |
There was a problem hiding this comment.
| if target_size < 504 and not x4_ok: | |
| target_size = 504 | |
| if target_size < 1008 and not x2_ok: | |
| target_size = 1008 | |
| if target_size < SIZE_THRESHOLDS[1] and not x4_ok: | |
| target_size = SIZE_THRESHOLDS[1] | |
| if target_size < SIZE_THRESHOLDS[2] and not x2_ok: | |
| target_size = SIZE_THRESHOLDS[2] |
| desc.File( | ||
| name="inputx2", | ||
| label="Inputx2", | ||
| description="Folder containing source images upscale by 2.", |
There was a problem hiding this comment.
| description="Folder containing source images upscale by 2.", | |
| description="Folder containing source images upscaled by 2.", |
| desc.File( | ||
| name="inputx4", | ||
| label="Inputx4", | ||
| description="Folder containing source images upscale by 4.", |
There was a problem hiding this comment.
| description="Folder containing source images upscale by 4.", | |
| description="Folder containing source images upscaled by 4.", |
|
|
||
| image_paths.sort(key=lambda x: x[0]) | ||
| else: | ||
| raise ValueError(f"Input path '{input_path}' is not a valid path (folder or sfmData file).") |
There was a problem hiding this comment.
| raise ValueError(f"Input path '{input_path}' is not a valid path (folder or sfmData file).") | |
| raise ValueError(f"Input path '{input_path}' is not a valid sfmData file.") |
| for id, v in views.items(): | ||
| image_x1_path = Path(v.getImage().getImagePath()) | ||
| image_x1_name = image_x1_path.name | ||
| image_x2_path = None | ||
| if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)): | ||
| image_x2_path = os.path.join(path_folder_x2, image_x1_name) | ||
| image_x4_path = None | ||
| if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)): | ||
| image_x4_path = os.path.join(path_folder_x4, image_x1_name) | ||
| intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId()) | ||
| pinhole = camera.Pinhole.cast(intrinsic) | ||
| par = 1.0 | ||
| if pinhole is not None: | ||
| par = pinhole.getPixelAspectRatio() | ||
| image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(), | ||
| v.getImage().getHeight(), par, image_x2_path, image_x4_path)) |
There was a problem hiding this comment.
| for id, v in views.items(): | |
| image_x1_path = Path(v.getImage().getImagePath()) | |
| image_x1_name = image_x1_path.name | |
| image_x2_path = None | |
| if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)): | |
| image_x2_path = os.path.join(path_folder_x2, image_x1_name) | |
| image_x4_path = None | |
| if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)): | |
| image_x4_path = os.path.join(path_folder_x4, image_x1_name) | |
| intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId()) | |
| pinhole = camera.Pinhole.cast(intrinsic) | |
| par = 1.0 | |
| if pinhole is not None: | |
| par = pinhole.getPixelAspectRatio() | |
| image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(), | |
| v.getImage().getHeight(), par, image_x2_path, image_x4_path)) | |
| commonParams = None | |
| for id, v in views.items(): | |
| image_x1_path = Path(v.getImage().getImagePath()) | |
| image_x1_name = image_x1_path.name | |
| image_x2_path = None | |
| if os.path.isfile(os.path.join(path_folder_x2, image_x1_name)): | |
| image_x2_path = os.path.join(path_folder_x2, image_x1_name) | |
| image_x4_path = None | |
| if os.path.isfile(os.path.join(path_folder_x4, image_x1_name)): | |
| image_x4_path = os.path.join(path_folder_x4, image_x1_name) | |
| intrinsic = dataAV.getIntrinsicSharedPtr(v.getIntrinsicId()) | |
| pinhole = camera.Pinhole.cast(intrinsic) | |
| par = 1.0 | |
| if pinhole is not None: | |
| par = pinhole.getPixelAspectRatio() | |
| if commonParams is None: | |
| commonParams = [v.getImage().getWidth(), v.getImage().getHeight(), par, image_x2_path is None, image_x4_path is None] | |
| if commonParams != [v.getImage().getWidth(), v.getImage().getHeight(), par, image_x2_path is None, image_x4_path is None]: | |
| raise ValueError("All images do not have same dimensions or one image is missing its upscaled version.") | |
| image_paths.append((image_x1_path, str(id), v.getFrameId(), v.getImage().getWidth(), | |
| v.getImage().getHeight(), par, image_x2_path, image_x4_path)) |
This pull request introduces a new segmentation node and makes several improvements and bug fixes to the video segmentation pipeline. The main addition is the new
VideoSegmentationSam3Boxesnode, which segments video frames using bounding boxes from a JSON file. Additionally, several changes inVideoSegmentationSam3Text.pyimprove the consistency of mask and bounding box handling.New Node Addition:
VideoSegmentationSam3Boxesfor segmenting video frames based on bounding boxes from a JSON file, supporting multiple input resolutions, GPU usage, mask inversion, and flexible output options. This node integrates with the SAM3 video predictor and handles mask generation, file management, and metadata.Improvements in VideoSegmentationSam3Text:
sam3Utils.mapIds, as this information is not needed.These changes collectively improve the flexibility, correctness, and usability of the video segmentation pipeline, especially for workflows involving bounding box-based segmentation and multi-resolution inputs.