ps. rpn_head (nn.Module): module that computes the objectness and regression deltas from the RPN rpn_pre_nms_top_n_train (int): number of proposals to keep before applying NMS during training rpn_pre_nms_top_n_test (int): number of proposals to keep before applying NMS during testing rpn_post_nms_top_n_train (int): number of proposals to keep after applying NMS during training rpn_post_nms_top_n_test (int): number of proposals to keep after applying NMS during testing rpn_nms_thresh (float): NMS threshold used for postprocessing the RPN proposals rpn_fg_iou_thresh (float): minimum IoU between the anchor and the GT box so that they can be considered as positive during training of the RPN. rpn_bg_iou_thresh (float): maximum IoU between the anchor and the GT box so that they can be considered as negative during training of the RPN. rpn_batch_size_per_image (int): number of anchors that are sampled during training of the RPN for computing the loss rpn_positive_fraction (float): proportion of positive anchors in a mini-batch during training of the RPN rpn_score_thresh (float): during inference, only return proposals with a classification score greater than rpn_score_thresh box_roi_pool (MultiScaleRoIAlign): the module which crops and resizes the feature maps in the locations indicated by the bounding boxes box_head (nn.Module): module that takes the cropped feature maps as input box_predictor (nn.Module): module that takes the output of box_head and returns the classification logits and box regression deltas. box_score_thresh (float): during inference, only return proposals with a classification score greater than box_score_thresh box_nms_thresh (float): NMS threshold for the prediction head. Used during inference box_detections_per_img (int): maximum number of detections per image, for all classes. box_fg_iou_thresh (float): minimum IoU between the proposals and the GT box so that they can be considered as positive during training of the classification head box_bg_iou_thresh (float): maximum IoU between the proposals and the GT box so that they can be considered as negative during training of the classification head box_batch_size_per_image (int): number of proposals that are sampled during training of the classification head box_positive_fraction (float): proportion of positive proposals in a mini-batch during training of the classification head bbox_reg_weights (Tuple[float, float, float, float]): weights for the encoding/decoding of the bounding boxes mask_roi_pool (MultiScaleRoIAlign): the module which crops and resizes the feature maps in the locations indicated by the bounding boxes, which will be used for the mask head. mask_head (nn.Module): module that takes the cropped feature maps as input mask_predictor (nn.Module): module that takes the output of the mask_head and returns the segmentation mask logits Example:: >>> import torch >>> import torchvision >>> from torchvision.models.detection import MaskRCNN >>> from torchvision.models.detection.anchor_utils import AnchorGenerator >>> >>> # load a pre-trained model for classification and return >>> # only the features >>> backbone = torchvision.models.mobilenet_v2(weights=MobileNet_V2_Weights.DEFAULT).features >>> # MaskRCNN needs to know the number of >>> # output channels in a backbone. For mobilenet_v2, it's 1280 >>> # so we need to add it here, >>> backbone.out_channels = 1280 >>> >>> # let's make the RPN generate 5 x 3 anchors per spatial >>> # location, with 5 different sizes and 3 different aspect >>> # ratios. We have a Tuple[Tuple[int]] because each feature >>> # map could potentially have different sizes and >>> # aspect ratios >>> anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),), >>> aspect_ratios=((0.5, 1.0, 2.0),)) >>> >>> # let's define what are the feature maps that we will >>> # use to perform the region of interest cropping, as well as >>> # the size of the crop after rescaling. >>> # if your backbone returns a Tensor, featmap_names is expected to >>> # be ['0']. More generally, the backbone should return an >>> # OrderedDict[Tensor], and in featmap_names you can choose which >>> # feature maps to use. >>> roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], >>> output_size=7, >>> sampling_ratio=2) >>> >>> mask_roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'], >>> output_size=14, >>> sampling_ratio=2) >>> # put the pieces together inside a MaskRCNN model >>> model = MaskRCNN(backbone, >>> num_classes=2, >>> rpn_anchor_generator=anchor_generator, >>> box_roi_pool=roi_pooler, >>> mask_roi_pool=mask_roi_pooler) >>> model.eval() >>> x = [torch.rand(3, 300, 400), torch.rand(3, 500, 400)] >>> predictions = model(x) Né