simota

An implementation of SimOTA label assignment for the YOLOX object detection model based on OpenMMLab’s implementation in the mmdetection library.

source

AssignResult

 AssignResult (num_ground_truth_boxes:int,
               ground_truth_box_indices:torch.LongTensor,
               max_iou_values:torch.FloatTensor,
               category_labels:torch.LongTensor=None)

Stores assignments between predicted bounding boxes and actual truth bounding boxes.

Based on OpenMMLab’s implementation in the mmdetection library:


source

SimOTAAssigner

 SimOTAAssigner (center_radius:float=2.5, candidate_topk:int=10,
                 iou_weight:float=3.0, cls_weight:float=1.0)

The SimOTAAssigner class assigns predicted bounding boxes to their corresponding ground truth boxes in object detection tasks. It uses a process called SimOTA that formulates the assignment task as an optimal transport problem via a dynamic top-k strategy.

It calculates a cost matrix based on classification and regression (Intersection over Union, IoU) costs. It then uses this cost matrix to dynamically assign each ground truth object to the best matching bounding box predictions while resolving conflicts to ensure each prediction pairs with a single ground truth.

Based on OpenMMLab’s implementation in the mmdetection library:

Type Default Details
center_radius float 2.5 Ground truth center size to judge whether a output_grid_box is in center.
candidate_topk int 10 The candidate top-k which used to get top-k ious to calculate dynamic-k.
iou_weight float 3.0 The scale factor for regression iou cost.
cls_weight float 1.0 The scale factor for classification cost.

source

SimOTAAssigner.assign

 SimOTAAssigner.assign (pred_scores:torch.Tensor,
                        output_grid_boxes:torch.Tensor,
                        decoded_bboxes:torch.Tensor,
                        gt_bboxes:torch.Tensor, gt_labels:torch.Tensor,
                        gt_bboxes_ignore:Optional[torch.Tensor]=None,
                        eps:float=1e-07)

Assign ground truth to output_grid_boxes using SimOTA.

This method assigns predicted bounding boxes to ground truth boxes based on the computed cost matrix. It first extracts valid box predictions and scores. It then calculates the total cost matrix using IoU and classification costs. Finally, it uses the cost matrix to assign each prediction to a ground truth box.

Type Default Details
pred_scores Tensor Classification scores of each output grid box across all classes.
output_grid_boxes Tensor Output grid bounding boxes of one image in format [cx, xy, stride_w, stride_y].
decoded_bboxes Tensor Predicted bounding boxes of one image in format [tl_x, tl_y, br_x, br_y].
gt_bboxes Tensor Ground truth bounding boxes of one image in format [tl_x, tl_y, br_x, br_y].
gt_labels Tensor Ground truth labels of one image, It is a Tensor with shape [num_gts].
gt_bboxes_ignore typing.Optional[torch.Tensor] None Ground truth bounding boxes that are labelled as ignored, e.g., crowd boxes in COCO.
eps float 1e-07 A value added to the denominator for numerical stability.

source

SimOTAAssigner.get_in_gt_and_in_center_info

 SimOTAAssigner.get_in_gt_and_in_center_info
                                              (output_grid_boxes:torch.Ten
                                              sor, gt_bboxes)

Get the information about whether output_grid_boxes are in ground truth boxes or center.

This method determines which predicted boxes are inside a ground truth box and also at the center of the ground truth box. It computes the centers of the ground truth boxes, checks if the predicted boxes are inside the ground truth boxes and centers, and then returns a mask indicating which predicted boxes are in either any ground truth box or any center box and which are in both.

Type Details
output_grid_boxes Tensor All output_grid_boxes of one image, a 2D-Tensor with shape [num_output_grid_boxes, 4] in [cx, xy, stride_w, stride_y] format.
gt_bboxes Ground truth bboxes of one image, a 2D-Tensor with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format.
Returns typing.Tuple[torch.Tensor, torch.Tensor] The first tensor indicates if the output_grid_box is in any ground truth box or center, the second tensor specifies if the output_grid_box is in both the ground truth box and center.

source

SimOTAAssigner.dynamic_k_matching

 SimOTAAssigner.dynamic_k_matching (cost:torch.Tensor,
                                    pairwise_ious:torch.Tensor,
                                    num_gt:int, valid_mask:torch.Tensor)

This method performs the dynamic k-matching process. For each ground truth box, it finds the top-k matching box predictions based on the smallest cost. If a predicted box matches multiple ground truths, it keeps only the one with the smallest cost. Finally, it returns the matched ground-truth indices and IoUs for valid predicted boxes.

Type Details
cost Tensor A 2D tensor representing the cost matrix calculated from both classification cost and regression IoU cost. Shape is [num_output_grid_boxes, num_gts].
pairwise_ious Tensor A 2D tensor representing IoU scores between predictions and ground truths. Shape is [num_output_grid_boxes, num_gts].
num_gt int The number of ground truth boxes.
valid_mask Tensor A 1D tensor representing which predicted boxes are valid based on being in gt bboxes and in centers. Shape is [num_output_grid_boxes].
Returns typing.Tuple[torch.Tensor, torch.Tensor] (IoU scores for matched pairs, The indices of the ground truth for each output_grid_box)