= MODEL_TYPES[0]
model_type model_type
'yolox_tiny'
ConvModule (in_channels:int, out_channels:int, kernel_size:int, stride:int=1, padding:int=0, bias:bool=True, eps:float=1e-05, momentum:float=0.1, affine:bool=True, track_running_stats:bool=True, activation_function:Type[torch .nn.modules.module.Module]=<class 'torch.nn.modules.activation.SiLU'>)
*Configurable block used for Convolution2d-Normalization-Activation blocks.
Function forward(input x):
1. Pass the input (x) through the convolutional layer and store the result back to x.
2. Pass the output from the convolutional layer (now stored in x) through the batch normalization layer and store the result back to x.
3. Apply the activation function to the output of the batch normalization layer (x) and return the result.*
Type | Default | Details | |
---|---|---|---|
in_channels | int | Number of channels in the input image | |
out_channels | int | Number of channels produced by the convolution | |
kernel_size | int | Size of the convolving kernel | |
stride | int | 1 | Stride of the convolution. |
padding | int | 0 | Zero-padding added to both sides of the input. |
bias | bool | True | If set to False, the layer will not learn an additive bias. |
eps | float | 1e-05 | A value added to the denominator for numerical stability in BatchNorm2d. |
momentum | float | 0.1 | The value used for the running_mean and running_var computation in BatchNorm2d. |
affine | bool | True | If set to True, this module has learnable affine parameters. |
track_running_stats | bool | True | If set to True, this module tracks the running mean and variance. |
activation_function | Type | SiLU | The activation function to be applied after batch normalization. |
DarknetBottleneck (in_channels:int, out_channels:int, eps:float=0.001, momentum:float=0.03, affine:bool=True, track_running_stats:bool=True, add_identity:bool=True)
*Basic Darknet bottleneck block used in Darknet.
This class represents a basic bottleneck block used in Darknet, which consists of two convolutional layers with a possible identity shortcut.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
in_channels | int | The number of input channels to the block. | |
out_channels | int | The number of output channels from the block. | |
eps | float | 0.001 | A value added to the denominator for numerical stability in the ConvModule’s BatchNorm layer. |
momentum | float | 0.03 | The value used for the running_mean and running_var computation in the ConvModule’s BatchNorm layer. |
affine | bool | True | A flag that when set to True, gives the ConvModule’s BatchNorm layer learnable affine parameters. |
track_running_stats | bool | True | If True, the ConvModule’s BatchNorm layer will track the running mean and variance. |
add_identity | bool | True | If True, add an identity shortcut (also known as skip connection) to the output. |
Returns | None |
CSPLayer (in_channels:int, out_channels:int, num_blocks:int, kernel_size:int=1, stride:int=1, padding:int=0, eps:float=0.001, momentum:float=0.03, affine:bool=True, track_running_stats:bool=True, add_identity:bool=True)
*Cross Stage Partial Layer (CSPLayer).
This layer consists of a series of convolutions, blocks of transformations, and a final convolution. The inputs are processed via two paths: a main path with blocks and a shortcut path. The results from both paths are concatenated and further processed before returning the final output.
The blocks are instances of the DarknetBottleneck class which perform additional transformations.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
in_channels | int | Number of input channels. | |
out_channels | int | Number of output channels. | |
num_blocks | int | Number of blocks in the bottleneck. | |
kernel_size | int | 1 | Size of the convolving kernel. |
stride | int | 1 | Stride of the convolution. |
padding | int | 0 | Zero-padding added to both sides of the input. |
eps | float | 0.001 | A value added to the denominator for numerical stability. |
momentum | float | 0.03 | The value used for the running_mean and running_var computation. |
affine | bool | True | A flag that when set to True, gives the layer learnable affine parameters. |
track_running_stats | bool | True | Whether or not to track the running mean and variance during training. |
add_identity | bool | True | Whether or not to add an identity shortcut connection if the input and output are the same size. |
Returns | None |
Focus (in_channels:int, out_channels:int, kernel_size:int=1, stride:int=1, bias:bool=False, eps:float=0.001, momentum:float=0.03, affine:bool=True, track_running_stats:bool=True)
*Focus width and height information into channel space.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
in_channels | int | Number of input channels. | |
out_channels | int | Number of output channels. | |
kernel_size | int | 1 | Size of the convolving kernel. |
stride | int | 1 | Stride of the convolution. |
bias | bool | False | If set to False, the layer will not learn an additive bias. |
eps | float | 0.001 | A value added to the denominator for numerical stability in the ConvModule’s BatchNorm layer. |
momentum | float | 0.03 | The value used for the running_mean and running_var computation in the ConvModule’s BatchNorm layer. |
affine | bool | True | A flag that when set to True, gives the ConvModule’s BatchNorm layer learnable affine parameters. |
track_running_stats | bool | True | Whether or not to track the running mean and variance during training. |
SPPBottleneck (in_channels:int, out_channels:int, pool_sizes:List[int]=[5, 9, 13], eps:float=0.001, momentum:float=0.03, affine:bool=True, track_running_stats:bool=True)
*Spatial Pyramid Pooling layer used in YOLOv3-SPP
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
in_channels | int | The number of input channels. | |
out_channels | int | The number of output channels. | |
pool_sizes | List | [5, 9, 13] | The sizes of the pooling areas. |
eps | float | 0.001 | A value added to the denominator for numerical stability in the BatchNorm layer. |
momentum | float | 0.03 | The value used for the running_mean and running_var computation in the BatchNorm layer. |
affine | bool | True | A flag that when set to True, gives the BatchNorm layer learnable affine parameters. |
track_running_stats | bool | True | Whether to keep track of running mean and variance in BatchNorm. |
Returns | None |
CSPDarknet (arch='P5', deepen_factor=1.0, widen_factor=1.0, out_indices=(2, 3, 4), spp_kernal_sizes=(5, 9, 13), momentum=0.03, eps=0.001)
*The CSPDarknet
class implements a CSPDarknet backbone, a convolutional neural network (CNN) used in various image recognition tasks. The CSPDarknet backbone forms an integral part of the YOLOX object detection model.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
arch | str | P5 | Architecture configuration, ‘P5’ or ‘P6’. |
deepen_factor | float | 1.0 | Factor to adjust the number of channels in each layer. |
widen_factor | float | 1.0 | Factor to adjust the number of blocks in CSP layer. |
out_indices | tuple | (2, 3, 4) | Indices of the stages to output. |
spp_kernal_sizes | tuple | (5, 9, 13) | Sizes of the pooling operations in the Spatial Pyramid Pooling. |
momentum | float | 0.03 | Momentum for the moving average in batch normalization. |
eps | float | 0.001 | Epsilon for batch normalization to avoid numerical instability. |
csp_darknet_cfg = CSP_DARKNET_CFGS[model_type]
csp_darknet = CSPDarknet(**csp_darknet_cfg)
backbone_inp = torch.randn(1, 3, 256, 256)
with torch.no_grad():
backbone_out = csp_darknet(backbone_inp)
[out.shape for out in backbone_out]
[torch.Size([1, 96, 32, 32]),
torch.Size([1, 192, 16, 16]),
torch.Size([1, 384, 8, 8])]
YOLOXPAFPN (in_channels, out_channels, num_csp_blocks=3, upsample_cfg={'scale_factor': 2, 'mode': 'nearest'}, momentum=0.03, eps=0.001)
*Path Aggregation Feature Pyramid Network (PAFPN) used in YOLOX.
In object detection tasks, this class merges the feature maps from different layers of the backbone network. It helps in aggregating multi-scale feature maps to enhance the detection of objects of various sizes.
Based on OpenMMLab’s implementation in the mmdetection library:
pafpn_cfg = PAFPN_CFGS[model_type]
yolox_pafpn = YOLOXPAFPN(**pafpn_cfg)
with torch.no_grad():
neck_out = yolox_pafpn(backbone_out)
[out.shape for out in neck_out]
[torch.Size([1, 96, 32, 32]),
torch.Size([1, 96, 16, 16]),
torch.Size([1, 96, 8, 8])]
YOLOXHead (num_classes:int, in_channels:int, feat_channels=256, stacked_convs=2, strides=[8, 16, 32], momentum=0.03, eps=0.001)
*The YOLOXHead
class is a PyTorch module that implements the head of a YOLOX model https://arxiv.org/abs/2107.08430, used for bounding box prediction.
The head takes as input feature maps at multiple scale levels (e.g., from a feature pyramid network) and outputs predicted class scores, bounding box coordinates, and objectness scores for each scale level.
Based on OpenMMLab’s implementation in the mmdetection library:
Type | Default | Details | |
---|---|---|---|
num_classes | int | The number of target classes. | |
in_channels | int | The number of input channels. | |
feat_channels | int | 256 | The number of feature channels. |
stacked_convs | int | 2 | The number of convolution layers to stack. |
strides | list | [8, 16, 32] | The stride of each scale level in the feature pyramid. |
momentum | float | 0.03 | The momentum for the moving average in batch normalization. |
eps | float | 0.001 | The epsilon to avoid division by zero in batch normalization. |
head_cfg = HEAD_CFGS[model_type]
yolox_head = YOLOXHead(num_classes=80, **head_cfg)
with torch.no_grad():
cls_scores, bbox_preds, objectness = yolox_head(neck_out)
print(f"cls_scores: {[cls_score.shape for cls_score in cls_scores]}")
print(f"bbox_preds: {[bbox_pred.shape for bbox_pred in bbox_preds]}")
print(f"objectness: {[objectness.shape for objectness in objectness]}")
cls_scores: [torch.Size([1, 80, 32, 32]), torch.Size([1, 80, 16, 16]), torch.Size([1, 80, 8, 8])]
bbox_preds: [torch.Size([1, 4, 32, 32]), torch.Size([1, 4, 16, 16]), torch.Size([1, 4, 8, 8])]
objectness: [torch.Size([1, 1, 32, 32]), torch.Size([1, 1, 16, 16]), torch.Size([1, 1, 8, 8])]
YOLOX (backbone:__main__.CSPDarknet, neck:__main__.YOLOXPAFPN, bbox_head:__main__.YOLOXHead)
*Implementation of YOLOX: Exceeding YOLO Series in 2021
Function forward(input_tensor x):
Type | Details | |
---|---|---|
backbone | CSPDarknet | Backbone module for feature extraction. |
neck | YOLOXPAFPN | Neck module for feature aggregation. |
bbox_head | YOLOXHead | Bbox head module for predicting bounding boxes. |
yolox = YOLOX(csp_darknet, yolox_pafpn, yolox_head)
with torch.no_grad():
cls_scores, bbox_preds, objectness = yolox(backbone_inp)
print(f"cls_scores: {[cls_score.shape for cls_score in cls_scores]}")
print(f"bbox_preds: {[bbox_pred.shape for bbox_pred in bbox_preds]}")
print(f"objectness: {[objectness.shape for objectness in objectness]}")
cls_scores: [torch.Size([1, 80, 32, 32]), torch.Size([1, 80, 16, 16]), torch.Size([1, 80, 8, 8])]
bbox_preds: [torch.Size([1, 4, 32, 32]), torch.Size([1, 4, 16, 16]), torch.Size([1, 4, 8, 8])]
objectness: [torch.Size([1, 1, 32, 32]), torch.Size([1, 1, 16, 16]), torch.Size([1, 1, 8, 8])]
init_head (head:__main__.YOLOXHead, num_classes:int)
*Initialize the YOLOXHead
with appropriate class outputs and convolution layers.
This function configures the output channels in the YOLOX head to match the number of classes in the dataset. It also initializes multiple level convolutional layers for each stride in the YOLOX head.*
Type | Details | |
---|---|---|
head | YOLOXHead | The YOLOX head to be initialized. |
num_classes | int | The number of classes in the dataset. |
Returns | None |
ModuleList(
(0-2): 3 x Conv2d(96, 80, kernel_size=(1, 1), stride=(1, 1))
)
ModuleList(
(0-2): 3 x Conv2d(96, 19, kernel_size=(1, 1), stride=(1, 1))
)
build_model (model_type:str, num_classes:int, pretrained:bool=True, checkpoint_dir:str='./pretrained_checkpoints/')
Builds a YOLOX model based on the given parameters.
Type | Default | Details | |
---|---|---|---|
model_type | str | Type of the model to be built. | |
num_classes | int | Number of classes for the model. | |
pretrained | bool | True | Whether to load pretrained weights. |
checkpoint_dir | str | ./pretrained_checkpoints/ | Directory to store checkpoints. |
Returns | YOLOX | The built YOLOX model. |
yolox = build_model(model_type, 19, pretrained=True)
test_inp = torch.randn(1, 3, 256, 256)
with torch.no_grad():
cls_scores, bbox_preds, objectness = yolox(test_inp)
print(f"cls_scores: {[cls_score.shape for cls_score in cls_scores]}")
print(f"bbox_preds: {[bbox_pred.shape for bbox_pred in bbox_preds]}")
print(f"objectness: {[objectness.shape for objectness in objectness]}")