ValueError: Input boxes must be a list of list of list of floating points in SAM Fine-Tuning

Hello Python & PyTorch community,

I am currently fine-tuning Meta’s Segment Anything Model (SAM) to segment CHO cells using grayscale microscopy images and corresponding instance masks. However, I encountered the following error in my training loop:

ValueError: Input boxes must be a list of list of list of floating points.

:small_blue_diamond: Context

  • Each instance mask contains multiple labeled objects (cells).
  • Bounding boxes are extracted for each object using skimage.measure.regionprops.
  • The dataset structure is set up to associate each bounding box with its corresponding image and mask.
  • The training loop processes the bounding boxes using Hugging Face’s SamProcessor.

:small_blue_diamond: Error Location

The error occurs in SamProcessor._check_and_preprocess_points() when processing input_boxes. It expects a list of list of list of float values, but something seems off.


:small_blue_diamond: Relevant Code

Bounding Box Extraction Function:

from skimage.measure import regionprops

def get_instance_bboxes(mask):
    """Extract bounding boxes for each instance in the mask."""
    props = regionprops(mask)
    bboxes = []
    for prop in props:
        y1, x1, y2, x2 = prop.bbox  # (min_row, min_col, max_row, max_col)
        bboxes.append((x1, y1, x2, y2))  # Convert to (x_min, y_min, x_max, y_max)
    return bboxes if bboxes else [(0, 0, 1, 1)]  # Default box if no instances

Dataset Class (CellDataset)

class CellDataset(Dataset):
    def __init__(self, images, masks, processor):
        self.processor = processor
        self.samples = []

        for image, mask in zip(images, masks):
            bboxes = get_instance_bboxes(mask)
            for bbox in bboxes:
                self.samples.append({
                    'image': image,
                    'mask': mask,
                    'bbox': bbox
                })

    def __getitem__(self, idx):
        sample = self.samples[idx]
        image, mask, bbox = sample['image'], sample['mask'], sample['bbox']

        # Convert grayscale to RGB
        if image.ndim == 2:
            image = np.stack([image] * 3, axis=-1)

        # **Error occurs here**
        inputs = self.processor(image, input_boxes=[[bbox]], return_tensors="pt")

        inputs = {k: v.squeeze(0) for k, v in inputs.items()}
        inputs["ground_truth_mask"] = torch.tensor(mask, dtype=torch.long)

        return inputs

:small_blue_diamond: What I’ve Tried

  1. Checked Bounding Box Formatting:
  • Printed bbox values, and they appear as tuples like (x1, y1, x2, y2).
  • Tried manually converting them to lists of floats:
bbox = [float(coord) for coord in bbox]

but the error persists.
2. Verified Processor Input Structure:

  • According to the Hugging Face documentation, input_boxes should be in the format:
[[[x1, y1, x2, y2]]]  # Triple nested list of float values
  • My implementation already attempts this via [[bbox]], but I suspect the format is still incorrect.
  1. Explicitly Converted Bounding Boxes to Floats:
inputs = self.processor(image, input_boxes=[[[float(x) for x in bbox]]], return_tensors="pt")

Still no luck.


:small_blue_diamond: Question:

How can I correctly format input_boxes to avoid the “list of list of list of floating points” error when fine-tuning SAM? Are there any common pitfalls when using SamProcessor with bounding boxes?

Any guidance would be greatly appreciated! Thank you! :pray: