Performs deformable region-of-interest pooling on inputs. As described
in `Deformable Convolutional Networks <https://arxiv.org/abs/1703.06211>`_, it will get offset for each bin after
roi pooling so that pooling at correct region. Batch_size will change to the number of region bounding boxes after deformable_roi_pooling.
The operation has three steps:
1. Dividing each region proposal into equal-sized sections with the pooled_width and pooled_height.
2. Add offset to pixel in ROI to get new location and the new value which are computed directly through
bilinear interpolation with four nearest pixel.
3. Sample several points in each bin to get average values as output.
Args:
Args:
input (Variable):The input of Deformable PSROIPooling.The shape of input tensor is
input (Variable):The input of deformable roi pooling and it is tensor which value type is float32. The shape of input is
[N,C,H,W]. Where N is batch size,C is number of input channels,H
[N, C, H, W]. Where N is batch size, C is number of input channels,
is height of the feature, and W is the width of the feature.
H is height of the feature, and W is the width of the feature.
rois (Variable): ROIs (Regions of Interest) to pool over.It should be
rois (Variable): ROIs (Regions of Interest) with type float32 to pool over. It should be
a 2-D LoDTensor of shape (num_rois, 4), the lod level
a 2-D LoDTensor of shape (num_rois, 4), and the lod level
is 1. Given as [[x1, y1, x2, y2], ...], (x1, y1) is
is 1. Given as [[x1, y1, x2, y2], ...], (x1, y1) is
the top left coordinates, and (x2, y2) is the bottom
the top left coordinates, and (x2, y2) is the bottom
right coordinates.
right coordinates, which value type is float32.
trans (Variable): Offset of features on ROIs while pooling.The format is NCHW, where
trans (Variable): Offset of features on ROIs while pooling which value type is float32. The format is [N, C, H, W], where
N is number of ROIs, C is number of channels, which indicate the offset distance
N is number of ROIs, C is number of channels, which indicate the offset distance
in the x and y directions, H is pooled height, and W is pooled width.
in the x and y directions, H is pooled height, and W is pooled width.
no_trans (bool): Whether to add offset to get new value or not while roi pooling, which
no_trans (bool): Whether to add offset to get new value or not while roi pooling, which value with type bool is True or False.
value is True or False. Default: False.
If value is True, no offset will be added in operation. Default: False.
spatial_scale (float): Ratio of input feature map height (or width) to raw image height (or width).
spatial_scale (float): Ratio of input feature map height (or width) to raw image height (or width), which value type is float32.
Equals the reciprocal of total stride in convolutional layers, Default: 1.0.
Equals the reciprocal of total stride in convolutional layers, Default: 1.0.
group_size (list|tuple): The number of groups which input channels are divided.(eg.number of input channels
group_size (list|tuple): The number of groups which input channels are divided and the input is list or tuple, which value type is int32. (eg.number of input channels
is k1*k2*(C+1), which k1 and k2 are group width and height and C+1 is number of output
is k1 * k2 * (C + 1), which k1 and k2 are group width and height and C+1 is number of output
chanels. eg.(4, 6), which 4 is height of group and 6 is width of group. Default: [1, 1].
chanels.) eg.(4, 6), which 4 is height of group and 6 is width of group. Default: [1, 1].
pooled_height (integer): The pooled output height. Default: 1.
pooled_height (int): The pooled output height which value type is int32. Default: 1.
pooled_width (integer): The pooled output width. Default: 1.
pooled_width (int): The pooled output width which value type is int32. Default: 1.
part_size (list|tuple): The height and width of offset, eg.(4, 6), which height is 4 and width is 6, Default:
part_size (list|tuple): The height and width of offset which values in list or tuple is int32, eg.(4, 6), which height is 4 and width is 6, and values always equal to pooled_height \
if None, default value is [pooled_height, pooled_width].
and pooled_width. Default: if None, default value is [pooled_height, pooled_width].
sample_per_part (integer): The number of samples in each bin. Default: 1.
sample_per_part (int): The number of samples in each bin which value type is int32. If value is bigger, it will consume more performance. Default: 1.
trans_std (float): Coefficient of offset. Default: 0.1.
trans_std (float): Coefficient of offset which value type is float32. It controls weight of offset. Default: 0.1.
position_sensitive (bool): Whether to choose deformable psroi pooling mode or not. Default: False.
position_sensitive (bool): Whether to choose deformable psroi pooling mode or not, and value type is bool(True or False). If value is False, input dimension equals to output dimension. \
name (str): Name of layer. Default: None.
If value is True, input dimension shoule be output dimension * pooled_height * pooled_width. Default: False.
Returns:
name (str|None): Name of layer. Default: None.
Variable: The tensor variable storing the deformable psroi pooling \
Returns:
result.
Variable: Output of deformable roi pooling is that, if position sensitive is False, input dimension equals to output dimension. If position sensitive is True,\
input dimension should be the result of output dimension divided by pooled height and pooled width.
Examples:
Examples:
.. code-block:: python
.. code-block:: python
# position_sensitive=True
import paddle.fluid as fluid
input = fluid.layers.data(name="input",
shape=[2, 192, 64, 64],
dtype='float32',
append_batch_size=False)
rois = fluid.layers.data(name="rois",
shape=[4],
dtype='float32',
lod_level=1)
trans = fluid.layers.data(name="trans",
shape=[2, 384, 64, 64],
dtype='float32',
append_batch_size=False)
x = fluid.layers.nn.deformable_roi_pooling(input=input,