Hello Python & PyTorch community,
I am currently fine-tuning Meta’s Segment Anything Model (SAM) to segment CHO cells using grayscale microscopy images and corresponding instance masks. However, I encountered the following error in my training loop:
ValueError: Input boxes must be a list of list of list of floating points.
Context
- Each instance mask contains multiple labeled objects (cells).
- Bounding boxes are extracted for each object using
skimage.measure.regionprops
. - The dataset structure is set up to associate each bounding box with its corresponding image and mask.
- The training loop processes the bounding boxes using Hugging Face’s
SamProcessor
.
Error Location
The error occurs in SamProcessor._check_and_preprocess_points()
when processing input_boxes
. It expects a list of list of list of float values, but something seems off.
Relevant Code
Bounding Box Extraction Function:
from skimage.measure import regionprops
def get_instance_bboxes(mask):
"""Extract bounding boxes for each instance in the mask."""
props = regionprops(mask)
bboxes = []
for prop in props:
y1, x1, y2, x2 = prop.bbox # (min_row, min_col, max_row, max_col)
bboxes.append((x1, y1, x2, y2)) # Convert to (x_min, y_min, x_max, y_max)
return bboxes if bboxes else [(0, 0, 1, 1)] # Default box if no instances
Dataset Class (CellDataset
)
class CellDataset(Dataset):
def __init__(self, images, masks, processor):
self.processor = processor
self.samples = []
for image, mask in zip(images, masks):
bboxes = get_instance_bboxes(mask)
for bbox in bboxes:
self.samples.append({
'image': image,
'mask': mask,
'bbox': bbox
})
def __getitem__(self, idx):
sample = self.samples[idx]
image, mask, bbox = sample['image'], sample['mask'], sample['bbox']
# Convert grayscale to RGB
if image.ndim == 2:
image = np.stack([image] * 3, axis=-1)
# **Error occurs here**
inputs = self.processor(image, input_boxes=[[bbox]], return_tensors="pt")
inputs = {k: v.squeeze(0) for k, v in inputs.items()}
inputs["ground_truth_mask"] = torch.tensor(mask, dtype=torch.long)
return inputs
What I’ve Tried
- Checked Bounding Box Formatting:
- Printed
bbox
values, and they appear as tuples like(x1, y1, x2, y2)
. - Tried manually converting them to lists of floats:
bbox = [float(coord) for coord in bbox]
but the error persists.
2. Verified Processor Input Structure:
- According to the Hugging Face documentation,
input_boxes
should be in the format:
[[[x1, y1, x2, y2]]] # Triple nested list of float values
- My implementation already attempts this via
[[bbox]]
, but I suspect the format is still incorrect.
- Explicitly Converted Bounding Boxes to Floats:
inputs = self.processor(image, input_boxes=[[[float(x) for x in bbox]]], return_tensors="pt")
Still no luck.
Question:
How can I correctly format input_boxes
to avoid the “list of list of list of floating points” error when fine-tuning SAM? Are there any common pitfalls when using SamProcessor
with bounding boxes?
Any guidance would be greatly appreciated! Thank you!