simota

An implementation of SimOTA label assignment for the YOLOX object detection model based on OpenMMLab’s implementation in the mmdetection library.

source

AssignResult

 AssignResult (num_ground_truth_boxes:int,
               ground_truth_box_indices:torch.LongTensor,
               max_iou_values:torch.FloatTensor,
               category_labels:torch.LongTensor=None)

*Stores assignments between predicted bounding boxes and actual truth bounding boxes.

Based on OpenMMLab’s implementation in the mmdetection library:

OpenMMLab’s Implementation*

source

SimOTAAssigner

 SimOTAAssigner (center_radius:float=2.5, candidate_topk:int=10,
                 iou_weight:float=3.0, cls_weight:float=1.0)

*The SimOTAAssigner class assigns predicted bounding boxes to their corresponding ground truth boxes in object detection tasks. It uses a process called SimOTA that formulates the assignment task as an optimal transport problem via a dynamic top-k strategy.

It calculates a cost matrix based on classification and regression (Intersection over Union, IoU) costs. It then uses this cost matrix to dynamically assign each ground truth object to the best matching bounding box predictions while resolving conflicts to ensure each prediction pairs with a single ground truth.

Based on OpenMMLab’s implementation in the mmdetection library:

OpenMMLab’s Implementation*

	Type	Default	Details
center_radius	float	2.5	Ground truth center size to judge whether a output_grid_box is in center.
candidate_topk	int	10	The candidate top-k which used to get top-k ious to calculate dynamic-k.
iou_weight	float	3.0	The scale factor for regression iou cost.
cls_weight	float	1.0	The scale factor for classification cost.

source

SimOTAAssigner.assign

 SimOTAAssigner.assign (pred_scores:torch.Tensor,
                        output_grid_boxes:torch.Tensor,
                        decoded_bboxes:torch.Tensor,
                        gt_bboxes:torch.Tensor, gt_labels:torch.Tensor,
                        gt_bboxes_ignore:Optional[torch.Tensor]=None,
                        eps:float=1e-07)

*Assign ground truth to output_grid_boxes using SimOTA.

This method assigns predicted bounding boxes to ground truth boxes based on the computed cost matrix. It first extracts valid box predictions and scores. It then calculates the total cost matrix using IoU and classification costs. Finally, it uses the cost matrix to assign each prediction to a ground truth box.*

	Type	Default	Details
pred_scores	Tensor		Classification scores of each output grid box across all classes.
output_grid_boxes	Tensor		Output grid bounding boxes of one image in format [cx, xy, stride_w, stride_y].
decoded_bboxes	Tensor		Predicted bounding boxes of one image in format [tl_x, tl_y, br_x, br_y].
gt_bboxes	Tensor		Ground truth bounding boxes of one image in format [tl_x, tl_y, br_x, br_y].
gt_labels	Tensor		Ground truth labels of one image, It is a Tensor with shape [num_gts].
gt_bboxes_ignore	Optional	None	Ground truth bounding boxes that are labelled as `ignored`, e.g., crowd boxes in COCO.
eps	float	1e-07	A value added to the denominator for numerical stability.

source

SimOTAAssigner.get_in_gt_and_in_center_info

 SimOTAAssigner.get_in_gt_and_in_center_info
                                              (output_grid_boxes:torch.Ten
                                              sor, gt_bboxes)

*Get the information about whether output_grid_boxes are in ground truth boxes or center.

This method determines which predicted boxes are inside a ground truth box and also at the center of the ground truth box. It computes the centers of the ground truth boxes, checks if the predicted boxes are inside the ground truth boxes and centers, and then returns a mask indicating which predicted boxes are in either any ground truth box or any center box and which are in both.*

	Type	Details
output_grid_boxes	Tensor	All output_grid_boxes of one image, a 2D-Tensor with shape [num_output_grid_boxes, 4] in [cx, xy, stride_w, stride_y] format.
gt_bboxes		Ground truth bboxes of one image, a 2D-Tensor with shape [num_gts, 4] in [tl_x, tl_y, br_x, br_y] format.
Returns	Tuple	The first tensor indicates if the output_grid_box is in any ground truth box or center, the second tensor specifies if the output_grid_box is in both the ground truth box and center.

source

SimOTAAssigner.dynamic_k_matching

 SimOTAAssigner.dynamic_k_matching (cost:torch.Tensor,
                                    pairwise_ious:torch.Tensor,
                                    num_gt:int, valid_mask:torch.Tensor)

This method performs the dynamic k-matching process. For each ground truth box, it finds the top-k matching box predictions based on the smallest cost. If a predicted box matches multiple ground truths, it keeps only the one with the smallest cost. Finally, it returns the matched ground-truth indices and IoUs for valid predicted boxes.

	Type	Details
cost	Tensor	A 2D tensor representing the cost matrix calculated from both classification cost and regression IoU cost. Shape is [num_output_grid_boxes, num_gts].
pairwise_ious	Tensor	A 2D tensor representing IoU scores between predictions and ground truths. Shape is [num_output_grid_boxes, num_gts].
num_gt	int	The number of ground truth boxes.
valid_mask	Tensor	A 1D tensor representing which predicted boxes are valid based on being in gt bboxes and in centers. Shape is [num_output_grid_boxes].
Returns	Tuple	(IoU scores for matched pairs, The indices of the ground truth for each output_grid_box)