simota
AssignResult
AssignResult (num_ground_truth_boxes:int, ground_truth_box_indices:torch.LongTensor, max_iou_values:torch.FloatTensor, category_labels:torch.LongTensor=None)
*Stores assignments between predicted bounding boxes and actual truth bounding boxes.
Based on OpenMMLab’s implementation in the mmdetection library:
SimOTAAssigner
SimOTAAssigner (center_radius:float=2.5, candidate_topk:int=10, iou_weight:float=3.0, cls_weight:float=1.0)
*The SimOTAAssigner
class assigns predicted bounding boxes to their corresponding ground truth boxes in object detection tasks. It uses a process called SimOTA that formulates the assignment task as an optimal transport problem via a dynamic top-k strategy.
It calculates a cost matrix based on classification and regression (Intersection over Union, IoU) costs. It then uses this cost matrix to dynamically assign each ground truth object to the best matching bounding box predictions while resolving conflicts to ensure each prediction pairs with a single ground truth.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
center_radius | float | 2.5 | Ground truth center size to judge whether a output_grid_box is in center. |
candidate_topk | int | 10 | The candidate top-k which used to get top-k ious to calculate dynamic-k. |
iou_weight | float | 3.0 | The scale factor for regression iou cost. |
cls_weight | float | 1.0 | The scale factor for classification cost. |
SimOTAAssigner.assign
SimOTAAssigner.assign (pred_scores:torch.Tensor, output_grid_boxes:torch.Tensor, decoded_bboxes:torch.Tensor, gt_bboxes:torch.Tensor, gt_labels:torch.Tensor, gt_bboxes_ignore:Optional[torch.Tensor]=None, eps:float=1e-07)
*Assign ground truth to output_grid_boxes using SimOTA.
This method assigns predicted bounding boxes to ground truth boxes based on the computed cost matrix. It first extracts valid box predictions and scores. It then calculates the total cost matrix using IoU and classification costs. Finally, it uses the cost matrix to assign each prediction to a ground truth box.*
Type | Default | Details | |
---|---|---|---|
pred_scores | Tensor | Classification scores of each output grid box across all classes. | |
output_grid_boxes | Tensor | Output grid bounding boxes of one image in format [cx, xy, stride_w, stride_y]. | |
decoded_bboxes | Tensor | Predicted bounding boxes of one image in format [tl_x, tl_y, br_x, br_y]. | |
gt_bboxes | Tensor | Ground truth bounding boxes of one image in format [tl_x, tl_y, br_x, br_y]. | |
gt_labels | Tensor | Ground truth labels of one image, It is a Tensor with shape [num_gts]. | |
gt_bboxes_ignore | Optional | None | Ground truth bounding boxes that are labelled as ignored , e.g., crowd boxes in COCO. |
eps | float | 1e-07 | A value added to the denominator for numerical stability. |
SimOTAAssigner.get_in_gt_and_in_center_info
SimOTAAssigner.get_in_gt_and_in_center_info (output_grid_boxes:torch.Ten sor, gt_bboxes)
*Get the information about whether output_grid_boxes are in ground truth boxes or center.
This method determines which predicted boxes are inside a ground truth box and also at the center of the ground truth box. It computes the centers of the ground truth boxes, checks if the predicted boxes are inside the ground truth boxes and centers, and then returns a mask indicating which predicted boxes are in either any ground truth box or any center box and which are in both.*
Type | Details | |
---|---|---|
output_grid_boxes | Tensor | All output_grid_boxes of one image, a 2D-Tensor with shape [num_output_grid_boxes, 4] in [cx, xy, stride_w, stride_y] format. |
gt_bboxes | Ground truth bboxes of one image, a 2D-Tensor with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format. | |
Returns | Tuple | The first tensor indicates if the output_grid_box is in any ground truth box or center, the second tensor specifies if the output_grid_box is in both the ground truth box and center. |
SimOTAAssigner.dynamic_k_matching
SimOTAAssigner.dynamic_k_matching (cost:torch.Tensor, pairwise_ious:torch.Tensor, num_gt:int, valid_mask:torch.Tensor)
This method performs the dynamic k-matching process. For each ground truth box, it finds the top-k matching box predictions based on the smallest cost. If a predicted box matches multiple ground truths, it keeps only the one with the smallest cost. Finally, it returns the matched ground-truth indices and IoUs for valid predicted boxes.
Type | Details | |
---|---|---|
cost | Tensor | A 2D tensor representing the cost matrix calculated from both classification cost and regression IoU cost. Shape is [num_output_grid_boxes, num_gts]. |
pairwise_ious | Tensor | A 2D tensor representing IoU scores between predictions and ground truths. Shape is [num_output_grid_boxes, num_gts]. |
num_gt | int | The number of ground truth boxes. |
valid_mask | Tensor | A 1D tensor representing which predicted boxes are valid based on being in gt bboxes and in centers. Shape is [num_output_grid_boxes]. |
Returns | Tuple | (IoU scores for matched pairs, The indices of the ground truth for each output_grid_box) |