Finally BboxInsideWeights and BboxOutsideWeights are used to specify whether it would contribute to training loss.
Finally BboxInsideWeights and BboxOutsideWeights are used to specify whether it would contribute to training loss.
Args:
Args:
rpn_rois(Variable): A 2-D LoDTensor with shape [N, 4]. N is the number of the GenerateProposalOp's output, each element is a bounding box with [xmin, ymin, xmax, ymax] format.
rpn_rois(Variable): A 2-D LoDTensor with shape [N, 4]. N is the number of the GenerateProposalOp's output, each element is a bounding box with [xmin, ymin, xmax, ymax] format. The data type can be float32 or float64.
gt_classes(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a class label of groundtruth.
gt_classes(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a class label of groundtruth. The data type must be int32.
is_crowd(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a flag indicates whether a groundtruth is crowd.
is_crowd(Variable): A 2-D LoDTensor with shape [M, 1]. M is the number of groundtruth, each element is a flag indicates whether a groundtruth is crowd. The data type must be int32.
gt_boxes(Variable): A 2-D LoDTensor with shape [M, 4]. M is the number of groundtruth, each element is a bounding box with [xmin, ymin, xmax, ymax] format.
gt_boxes(Variable): A 2-D LoDTensor with shape [M, 4]. M is the number of groundtruth, each element is a bounding box with [xmin, ymin, xmax, ymax] format.
im_info(Variable): A 2-D LoDTensor with shape [B, 3]. B is the number of input images, each element consists of im_height, im_width, im_scale.
im_info(Variable): A 2-D LoDTensor with shape [B, 3]. B is the number of input images, each element consists of im_height, im_width, im_scale.
batch_size_per_im(int): Batch size of rois per images.
batch_size_per_im(int): Batch size of rois per images. The data type must be int32.
fg_fraction(float): Foreground fraction in total batch_size_per_im.
fg_fraction(float): Foreground fraction in total batch_size_per_im. The data type must be float32.
fg_thresh(float): Overlap threshold which is used to chose foreground sample.
fg_thresh(float): Overlap threshold which is used to chose foreground sample. The data type must be float32.
bg_thresh_hi(float): Overlap threshold upper bound which is used to chose background sample.
bg_thresh_hi(float): Overlap threshold upper bound which is used to chose background sample. The data type must be float32.
bg_thresh_lo(float): Overlap threshold lower bound which is used to chose background sample.
bg_thresh_lo(float): Overlap threshold lower bound which is used to chose background sample. The data type must be float32.
bbox_reg_weights(list|tuple): Box regression weights. The data type must be float32.
class_nums(int): Class number.
class_nums(int): Class number. The data type must be int32.
use_random(bool): Use random sampling to choose foreground and background boxes.
use_random(bool): Use random sampling to choose foreground and background boxes.
is_cls_agnostic(bool): bbox regression use class agnostic simply which only represent fg and bg boxes.
is_cls_agnostic(bool): bbox regression use class agnostic simply which only represent fg and bg boxes.
is_cascade_rcnn(bool): it will filter some bbox crossing the image's boundary when setting True.
is_cascade_rcnn(bool): it will filter some bbox crossing the image's boundary when setting True.
Returns:
tuple:
A tuple with format``(rois, labels_int32, bbox_targets, bbox_inside_weights, bbox_outside_weights)``.
- **rois**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 4]``. The data type is the same as ``rpn_rois``.
- **labels_int32**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 1]``. The data type must be int32.
- **bbox_targets**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 4 * class_num]``. The regression targets of all RoIs. The data type is the same as ``rpn_rois``.
- **bbox_inside_weights**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 4 * class_num]``. The weights of foreground boxes' regression loss. The data type is the same as ``rpn_rois``.
- **bbox_outside_weights**: 2-D LoDTensor with shape ``[batch_size_per_im * batch_size, 4 * class_num]``. The weights of regression loss. The data type is the same as ``rpn_rois``.
scores(Variable): A 4-D Tensor with shape [N, A, H, W] represents
scores(Variable): A 4-D Tensor with shape [N, A, H, W] represents
the probability for each box to be an object.
the probability for each box to be an object.
N is batch size, A is number of anchors, H and W are height and
N is batch size, A is number of anchors, H and W are height and
width of the feature map.
width of the feature map. The data type must be float32.
bbox_deltas(Variable): A 4-D Tensor with shape [N, 4*A, H, W]
bbox_deltas(Variable): A 4-D Tensor with shape [N, 4*A, H, W]
represents the differece between predicted box locatoin and
represents the differece between predicted box locatoin and
anchor location.
anchor location. The data type must be float32.
im_info(Variable): A 2-D Tensor with shape [N, 3] represents origin
im_info(Variable): A 2-D Tensor with shape [N, 3] represents origin
image information for N batch. Info contains height, width and scale
image information for N batch. Info contains height, width and scale
between origin image size and the size of feature map.
between origin image size and the size of feature map.
The data type must be int32.
anchors(Variable): A 4-D Tensor represents the anchors with a layout
anchors(Variable): A 4-D Tensor represents the anchors with a layout
of [H, W, A, 4]. H and W are height and width of the feature map,
of [H, W, A, 4]. H and W are height and width of the feature map,
num_anchors is the box count of each position. Each anchor is
num_anchors is the box count of each position. Each anchor is
in (xmin, ymin, xmax, ymax) format an unnormalized.
in (xmin, ymin, xmax, ymax) format an unnormalized. The data type must be float32.
variances(Variable): The expanded variances of anchors with a layout of
variances(Variable): A 4-D Tensor. The expanded variances of anchors with a layout of
[H, W, num_priors, 4]. Each variance is in
[H, W, num_priors, 4]. Each variance is in
(xcenter, ycenter, w, h) format.
(xcenter, ycenter, w, h) format. The data type must be float32.
pre_nms_top_n(float): Number of total bboxes to be kept per
pre_nms_top_n(float): Number of total bboxes to be kept per
image before NMS. 6000 by default.
image before NMS. The data type must be float32. `6000` by default.
post_nms_top_n(float): Number of total bboxes to be kept per
post_nms_top_n(float): Number of total bboxes to be kept per
image after NMS. 1000 by default.
image after NMS. The data type must be float32. `1000` by default.
nms_thresh(float): Threshold in NMS, 0.5 by default.
nms_thresh(float): Threshold in NMS. The data type must be float32. `0.5` by default.
min_size(float): Remove predicted boxes with either height or
min_size(float): Remove predicted boxes with either height or
width < min_size. 0.1 by default.
width < min_size. The data type must be float32. `0.1` by default.
eta(float): Apply in adaptive NMS, if adaptive threshold > 0.5,
eta(float): Apply in adaptive NMS, if adaptive `threshold > 0.5`,
adaptive_threshold = adaptive_threshold * eta in each iteration.
`adaptive_threshold = adaptive_threshold * eta` in each iteration.
Returns:
tuple:
A tuple with format ``(rpn_rois, rpn_roi_probs)``.
- **rpn_rois**: The generated RoIs. 2-D Tensor with shape ``[N, 4]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``.
- **rpn_roi_probs**: The scores of generated RoIs. 2-D Tensor with shape ``[N, 1]`` while ``N`` is the number of RoIs. The data type is the same as ``scores``.
the input feature map should be sampled to produce the transformed
the input feature map should be sampled to produce the transformed
output feature map.
output feature map.
.. code-block:: text
Args:
theta (Variable) - A Tensor with shape [N, 2, 3]. It contains a batch of affine transform parameters.
* Case 1:
The data type can be float32 or float64.
out_shape (Variable | list | tuple): The shape of target output with format [batch_size, channel, height, width].
Given:
``out_shape`` can be a Tensor or a list or tuple. The data
type must be int32.
theta = [[[x_11, x_12, x_13]
name(str|None): The default value is None. Normally there is no need for user to set this property. For more information, please refer to :ref:`api_guide_Name`.
[x_14, x_15, x_16]]
[[x_21, x_22, x_23]
[x_24, x_25, x_26]]]
out_shape = [2, 3, 5, 5]
Step 1:
Generate normalized coordinates according to out_shape.
The values of the normalized coordinates are in the interval between -1 and 1.
The shape of the normalized coordinates is [2, H, W] as below:
C = [[[-1. -1. -1. -1. -1. ]
[-0.5 -0.5 -0.5 -0.5 -0.5]
[ 0. 0. 0. 0. 0. ]
[ 0.5 0.5 0.5 0.5 0.5]
[ 1. 1. 1. 1. 1. ]]
[[-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ]
[-1. -0.5 0. 0.5 1. ]]]
C[0] is the coordinates in height axis and C[1] is the coordinates in width axis.
Step2:
Tanspose and reshape C to shape [H * W, 2] and append ones to last dimension. The we get:
C_ = [[-1. -1. 1. ]
[-0.5 -1. 1. ]
[ 0. -1. 1. ]
[ 0.5 -1. 1. ]
[ 1. -1. 1. ]
[-1. -0.5 1. ]
[-0.5 -0.5 1. ]
[ 0. -0.5 1. ]
[ 0.5 -0.5 1. ]
[ 1. -0.5 1. ]
[-1. 0. 1. ]
[-0.5 0. 1. ]
[ 0. 0. 1. ]
[ 0.5 0. 1. ]
[ 1. 0. 1. ]
[-1. 0.5 1. ]
[-0.5 0.5 1. ]
[ 0. 0.5 1. ]
[ 0.5 0.5 1. ]
[ 1. 0.5 1. ]
[-1. 1. 1. ]
[-0.5 1. 1. ]
[ 0. 1. 1. ]
[ 0.5 1. 1. ]
[ 1. 1. 1. ]]
Step3:
Compute output by equation $$Output[i] = C_ * Theta[i]^T$$
Args:
theta (Variable): A batch of affine transform parameters with shape [N, 2, 3].
out_shape (Variable | list | tuple): The shape of target output with format [N, C, H, W].
``out_shape`` can be a Variable or a list or tuple.
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
Returns:
Returns:
Variable: The output with shape [N, H, W, 2].
Variable: A Tensor with shape [batch_size, H, W, 2] while 'H' and 'W' are the height and width of feature map in affine transformation. The data type is the same as `theta`.
Raises:
Raises:
ValueError: If the type of arguments is not supported.
ValueError: If the type of arguments is not supported.