From 423f7bcbc96b68e5d5f1205f2f1383bca32db2ce Mon Sep 17 00:00:00 2001
From: FlyingQianMM <245467267@qq.com>
Date: Mon, 23 Sep 2019 20:36:25 +0800
Subject: [PATCH] refine api_cn for retinanet_target_assign,
 sigmoid_focal_loss, retinanet_detection_output (#1258)

* test=document_preview refine api_cn for retinanet_target_assign_cn, sigmoid_focal_loss_cn, retinanet_detection_output

* test=document_preview refine the api_cn of op retinanet_target_assign, retinanet_detection_output, sigmoid_focal_loss

* test=document_preview refine api_cn of op retinanet_target_assign_cn
---
 .../retinanet_detection_output_cn.rst         | 58 +++++++-------
 .../layers_cn/retinanet_target_assign_cn.rst  | 75 ++++++++++---------
 .../layers_cn/sigmoid_focal_loss_cn.rst       | 35 ++++-----
 3 files changed, 83 insertions(+), 85 deletions(-)

diff --git a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
index edfcfa297..0c8680267 100644
--- a/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_detection_output_cn.rst
@@ -5,32 +5,30 @@ retinanet_detection_output
 
 .. py:function:: paddle.fluid.layers.retinanet_detection_output(bboxes, scores, anchors, im_info, score_threshold=0.05, nms_top_k=1000, keep_top_k=100, nms_threshold=0.3, nms_eta=1.0)
 
-**Retinanet的检测输出层**
+**注意：该OP目前仅支持CPU** 。
 
-此操作通过执行以下步骤获取检测结果：
+在 `RetinaNet <https://arxiv.org/abs/1708.02002>`_ 中，有多个 `FPN <https://arxiv.org/abs/1612.03144>`_ 层会输出用于分类的预测值和位置回归的预测值，该OP通过执行以下步骤将这些预测值转换成最终的检测结果：
 
-1. 根据anchor框解码每个FPN级别的最高得分边界框预测。
-2. 合并所有级别的顶级预测并对其应用多级非最大抑制（NMS）以获得最终检测。
+1. 在每个FPN层上，先剔除分类预测值小于score_threshold的anchor，然后按分类预测值从大到小排序，选出排名前nms_top_k的anchor，并将这些anchor与其位置回归的预测值做解码操作得到检测框。
+2. 合并全部FPN层上的检测框，对这些检测框进行非极大值抑制操作（NMS）以获得最终的检测结果。
 
 
 参数：
-    - **bboxes**  (List) – 来自多个FPN级别的张量列表。每个元素都是一个三维张量，形状[N，Mi，4]代表Mi边界框的预测位置。N是batch大小，Mi是第i个FPN级别的边界框数，每个边界框有四个坐标值，布局为[xmin，ymin，xmax，ymax]。
-    - **scores**  (List) – 来自多个FPN级别的张量列表。每个元素都是一个三维张量，各张量形状为[N，Mi，C]，代表预测的置信度预测。 N是batch大小，C是类编号（不包括背景），Mi是第i个FPN级别的边界框数。对于每个边界框，总共有C个评分。
-    - **anchors**  (List) – 具有形状[Mi，4]的2-D Tensor表示来自所有FPN级别的Mi anchor框的位置。每个边界框有四个坐标值，布局为[xmin，ymin，xmax，ymax]。
-    - **im_info**  (Variable) – 形状为[N，3]的2-D LoDTensor表示图像信息。 N是batch大小，每个图像信息包括高度，宽度和缩放比例。
-    - **score_threshold**  (float) – 用置信度分数剔除边界框的过滤阈值。
-    - **nms_top_k**  (int) – 根据NMS之前的置信度保留每个FPN层的最大检测数。
-    - **keep_top_k**  (int) – NMS步骤后每个图像要保留的总边界框数。 -1表示在NMS步骤之后保留所有边界框。
-    - **nms_threshold**  (float) – NMS中使用的阈值.
-    - **nms_eta**  (float) – adaptive NMS的参数.
+    - **bboxes**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的位置回归预测值。列表中每个元素是一个维度为 :math:`[N, Mi, 4]` 的3-D Tensor，其中，第一维N表示批量训练时批量内的图片数量，第二维Mi表示每张图片第i个FPN层上的anchor数量，第三维4表示每个anchor有四个坐标值。数据类型为float32或float64。
+    - **scores**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的分类预测值。列表中每个元素是一个维度为 :math:`[N, Mi, C]` 的3-D Tensor，其中第一维N表示批量训练时批量内的图片数量，第二维Mi表示每张图片第i个FPN层上的anchor数量，第三维C表示类别数量（ **不包括背景类** ）。数据类型为float32或float64。
+    - **anchors**  (List) – 由来自不同FPN层的Tensor组成的列表，表示全部anchor的坐标值。列表中每个元素是一个维度为 :math:`[Mi, 4]` 的2-D Tensor，其中第一维Mi表示第i个FPN层上的anchor数量，第二维4表示每个anchor有四个坐标值（[xmin, ymin, xmax, ymax]）。数据类型为float32或float64。
+    - **im_info**  (Variable) – 维度为 :math:`[N, 3]` 的2-D Tensor，表示输入图片的尺寸信息。 其中，第一维N表示批量训练时各批量内的图片数量，第二维3表示各图片的尺寸信息，分别是网络输入尺寸的高和宽，以及原图缩放至网络输入大小时的缩放比例。数据类型为float32或float64。
+    - **score_threshold**  (float32) – 在NMS步骤之前，用于滤除每个FPN层的检测框的阈值，默认值为0.05。
+    - **nms_top_k**  (int32) – 在NMS步骤之前，保留每个FPN层的检测框的数量，默认值为1000。
+    - **keep_top_k**  (int32) – 在NMS步骤之后，每张图像要保留的检测框数量，默认值为100，若设为-1，则表示保留NMS步骤后剩下的全部检测框。
+    - **nms_threshold**  (float32) – NMS步骤中用于剔除检测框的Intersection-over-Union（IoU）阈值，默认为0.3。
+    - **nms_eta**  (float32) – `Adaptive NMS <https://arxiv.org/abs/1904.03629>`_ 中用于调整IoU阈值的参数，默认值为1.，表示使用常规NMS。
+**注意：在模型输入尺寸特别小的情况，此时若用score_threshold滤除anchor，可能会导致没有任何检测框剩余。为避免这种情况出现，该OP不会对最高FPN层上的anchor做滤除。因此，要求bboxes、scores、anchors中最后一个元素是来自最高FPN层的Tensor** 。
 
+返回：维度是 :math:`[No, 6]` 的2-D LoDTensor，表示批量内的检测结果。第一维No表示批量内的检测框的总数，第二维6表示每行有六个值：[label， score，xmin，ymin，xmax，ymax]。该LoDTensor的LoD中存放了每张图片的检测框数量，第i张图片的检测框数量为 :math:`LoD[i + 1] - LoD[i]` 。如果 :math:`LoD[i + 1] - LoD[i]` 为0，则第i个图像没有检测结果。 如果批量内的全部图像都没有检测结果，则LoD中所有元素被设置为0，LoDTensor被赋为空（None）。
 
 
-返回：
-检测输出是具有形状[No，6]的LoDTensor。 每行有六个值：[标签，置信度，xmin，ymin，xmax，ymax]。 No是此mini batch中的检测总数。 对于每个实例，第一维中的偏移称为LoD，偏移值为N + 1，N是batch大小。 第i个图像具有LoD [i + 1]  -  LoD [i]检测结果，如果为0，则第i个图像没有检测到结果。 如果所有图像都没有检测到结果，则LoD将设置为0，输出张量为空（None）。
-
-
-返回类型：变量（Variable）
+返回类型：变量（Variable），数据类型为float32或float64。
 
 **代码示例**
 
@@ -38,24 +36,26 @@ retinanet_detection_output
 
   import paddle.fluid as fluid
 
-  bboxes = layers.data(name='bboxes', shape=[1, 21, 4],
+  bboxes_low = fluid.layers.data(name='bboxes_low', shape=[1, 44, 4],
+      append_batch_size=False, dtype='float32')
+  bboxes_high = fluid.layers.data(name='bboxes_high', shape=[1, 11, 4],
+  scores_low = fluid.layers.data(name='scores_low', shape=[1, 44, 10],
+      append_batch_size=False, dtype='float32')
+  scores_high = fluid.layers.data(name='scores_high', shape=[1, 11, 10],
       append_batch_size=False, dtype='float32')
-  scores = layers.data(name='scores', shape=[1, 21, 10],
+  anchors_low = fluid.layers.data(name='anchors_low', shape=[44, 4],
       append_batch_size=False, dtype='float32')
-  anchors = layers.data(name='anchors', shape=[21, 4],
+  anchors_high = fluid.layers.data(name='anchors_high', shape=[11, 4],
       append_batch_size=False, dtype='float32')
-  im_info = layers.data(name="im_info", shape=[1, 3],
+  im_info = fluid.layers.data(name="im_info", shape=[1, 3],
       append_batch_size=False, dtype='float32')
   nmsed_outs = fluid.layers.retinanet_detection_output(
-                                          bboxes=[bboxes, bboxes],
-                                          scores=[scores, scores],
-                                          anchors=[anchors, anchors],
+                                          bboxes=[bboxes_low, bboxes_high],
+                                          scores=[scores_low, scores_high],
+                                          anchors=[anchors_low, anchors_high],
                                           im_info=im_info,
                                           score_threshold=0.05,
                                           nms_top_k=1000,
                                           keep_top_k=100,
-                                          nms_threshold=0.3,
+                                          nms_threshold=0.45,
                                           nms_eta=1.)
-
-
-
diff --git a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
index e828914dd..9d188f5c0 100644
--- a/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/retinanet_target_assign_cn.rst
@@ -5,70 +5,71 @@ retinanet_target_assign
 
 .. py:function:: paddle.fluid.layers.retinanet_target_assign(bbox_pred, cls_logits, anchor_box, anchor_var, gt_boxes, gt_labels, is_crowd, im_info, num_classes=1, positive_overlap=0.5, negative_overlap=0.4)
 
-**Retinanet的目标分配层**
+**注意：该OP目前仅支持CPU** 。
 
-对于给定anchors和真实(ground-truth)框之间的Intersection-over-Union（IoU）重叠，该层可以为每个anchor分配分类和回归目标，同时这些目标标签用于训练Retinanet。每个anchor都分配有长度为num_classes的一个one-hot分类目标向量，以及一个4向量的框回归目标。分配规则如下：
+该OP是从输入anchor中找出训练检测模型 `RetinaNet <https://arxiv.org/abs/1708.02002>`_ 所需的正负样本，并为每个正负样本分配用于分类的目标值和位置回归的目标值，同时从全部anchor的类别预测值cls_logits、位置预测值bbox_pred中取出属于各正负样本的部分。
 
-1.在以下情况下，anchor被分配到真实框：
-（i）它与真实框具有最高的IoU重叠，或者（ii）与任何真实框具有高于positive_overlap（0.5）的IoU重叠。
+正负样本的查找准则如下：
+    - 若anchor与某个真值框之间的Intersection-over-Union（IoU）大于其他anchor与该真值框的IoU，则该anchor是正样本，且被分配给该真值框；
+    - 若anchor与某个真值框之间的IoU大于等于positive_overlap，则该anchor是正样本，且被分配给该真值框；
+    - 若anchor与某个真值框之间的IoU介于[0, negative_overlap)，则该anchor是负样本；
+    - 不满足以上准则的anchor不参与模型训练。
 
-2.对于所有真实框，当其IoU比率低于negative_overlap（0.4）时，将anchor点分配给背景。
-
-当为锚点分配了第i个类别的真实框时，其C向量目标中的第i项设置为1，所有其他条目设置为0.当anchor被分配支背景时，所有项都设置为0。未被分配的锚点不会影响训练目标。回归目标是与指定anchor相关联的已编码真实框。
+在RetinaNet中，对于每个anchor，模型都会预测一个C维的向量用于分类，和一个4维的向量用于位置回归，因此各正负样本的分类目标值也是一个C维向量，各正样本的位置回归目标值也是一个4维向量。对于正样本而言，若其被分配的真值框的类别是i，则其分类目标值的第i-1维为1，其余维度为0；其位置回归的目标值由anchor和真值框之间位置差值计算得到。对于负样本而言，其分类目标值的所有维度都为0，因负样本不参与位置回归的训练，故负样本无位置回归的目标值。
 
+分配结束后，从全部anchor的类别预测值cls_logits中取出属于各正负样本的部分，从针对全部anchor的位置预测值bbox_pred中取出属于各正样本的部分。
 
 
 参数：
-    - **bbox_pred**  (Variable) – 具有形状[N，M，4]的3-D张量表示M个边界框(bounding box)的预测位置。 N是batch大小，每个边界框有四个坐标值，为[xmin，ymin，xmax，ymax]。
-    - **cls_logits**  (Variable) – 具有形状[N，M，C]的3-D张量，表示预测的置信度。 N是batch大小，C是类别的数量（不包括背景），M是边界框的数量。
-    - **anchor_box**  (Variable) – 具有形状[M，4]的2-D张量，存有M个框，每个框表示为[xmin，ymin，xmax，ymax]，[xmin，ymin]是anchor的左上顶部坐标，如果输入是图像特征图，则它们接近坐标系的原点。 [xmax，ymax]是anchor的右下坐标。
-    - **anchor_var**  (Variable) – 具有形状[M，4]的2-D张量，存有anchor的扩展方差。
-    - **gt_boxes**  (Variable) – 真实框是具有形状[Ng，4]的2D LoDTensor，Ng是mini batch中真实框的总数。
-    - **gt_labels**  (variable) – 真实值标签是具有形状[Ng，1]的2D LoDTensor，Ng是mini batch输入真实值标签的总数。
-    - **is_crowd**  (Variable) – 1-D LoDTensor，标志真实值是聚群。
-    - **im_info**  (Variable) – 具有形状[N，3]的2-D LoDTensor。 N是batch大小，3分别为高度，宽度和比例。
-    - **num_classes**  (int32) – 种类数量。
-    - **positive_overlap**  (float) – 判定（anchor，gt框）对是一个正例的anchor和真实框之间最小重叠阀值。
-    - **negative_overlap**  (float) – （锚点，gt框）对是负例时anchor和真实框之间允许的最大重叠阈值。
+    - **bbox_pred**  (Variable) – 维度为 :math:`[N, M, 4]` 的3-D Tensor，表示全部anchor的位置回归预测值。其中，第一维N表示批量训练时批量内的图片数量，第二维M表示每张图片的全部anchor的数量，第三维4表示每个anchor有四个坐标值。数据类型为float32或float64。
+    - **cls_logits**  (Variable) – 维度为 :math:`[N, M, C]` 的3-D Tensor，表示全部anchor的分类预测值。 其中，第一维N表示批量训练时批量内的图片数量，第二维M表示每张图片的全部anchor的数量，第三维C表示每个anchor需预测的类别数量（ **注意：不包括背景** ）。数据类型为float32或float64。
+
+    - **anchor_box**  (Variable) – 维度为 :math:`[M, 4]` 的2-D Tensor，表示全部anchor的坐标值。其中，第一维M表示每张图片的全部anchor的数量，第二维4表示每个anchor有四个坐标值 :math:`[xmin, ymin, xmax, ymax]` ，:math:`[xmin, ymin]` 是anchor的左上顶部坐标，:math:`[xmax, ymax]` 是anchor的右下坐标。数据类型为float32或float64。anchor_box的生成请参考OP `anchor_generate <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/detection.html#anchor-generator>`_ 。
+    - **anchor_var**  (Variable) – 维度为 :math:`[M, 4]` 的2-D Tensor，表示在后续计算损失函数时anchor坐标值的缩放比例。其中，第一维M表示每张图片的全部anchor的数量，第二维4表示每个anchor有四个坐标缩放因子。数据类型为float32或float64。anchor_var的生成请参考OP `anchor_generate <https://www.paddlepaddle.org.cn/documentation/docs/en/1.5/api/layers/detection.html#anchor-generator>`_ 。
+    - **gt_boxes**  (Variable) – 维度为 :math:`[G, 4]` 且LoD level必须为1的2-D LoDTensor，表示批量训练时批量内的真值框位置。其中，第一维G表示批量内真值框的总数，第二维表示每个真值框有四个坐标值。数据类型为float32或float64。
+    - **gt_labels**  (variable) – 维度为 :math:`[G, 1]` 且LoD level必须为1的2-D LoDTensor，表示批量训练时批量内的真值框类别，数值范围为 :math:`[1, C]` 。其中，第一维G表示批量内真值框的总数，第二维表示每个真值框只有1个类别。数据类型为int32。
+    - **is_crowd**  (Variable) – 维度为 :math:`[G]` 且LoD level必须为1的1-D LoDTensor，表示各真值框是否位于重叠区域，值为1表示重叠，则不参与训练。第一维G表示批量内真值框的总数。数据类型为int32。
+    - **im_info**  (Variable) – 维度为 :math:`[N, 3]` 的2-D Tensor，表示输入图片的尺寸信息。其中，第一维N表示批量训练时批量内的图片数量，第二维3表示各图片的尺寸信息，分别是网络输入尺寸的高和宽，以及原图缩放至网络输入尺寸的缩放比例。数据类型为float32或float64。
+    - **num_classes**  (int32) – 分类的类别数量，默认值为1。
+    - **positive_overlap**  (float32) – 判定anchor是一个正样本时anchor和真值框之间的最小IoU，默认值为0.5。
+    - **negative_overlap**  (float32) – 判定anchor是一个负样本时anchor和真值框之间的最大IoU，默认值为0.4。该参数的设定值应小于等于positive_overlap的设定值，若大于，则positive_overlap的取值为negative_overlap的设定值。
 
 
 返回：
-返回元组（predict_scores，predict_location，target_label，target_bbox，bbox_inside_weight，fg_num）。 predict_scores和predict_location是Retinanet的预测结果。target_label和target_bbox为真实值。 predict_location是形为[F，4]的2D张量，target_bbox的形状与predict_location的形状相同，F是前景anchor的数量。 predict_scores是具有形状[F + B，C]的2D张量，target_label的形状是[F + B，1]，B是背景anchor的数量，F和B取决于此算子的输入。 Bbox_inside_weight标志预测位置是否为假前景，形状为[F，4]。 Fg_num是focal loss所需的前景数（包括假前景）。
+    - **predict_scores** (Variable) – 维度为 :math:`[F + B, C]` 的2-D Tensor，表示正负样本的分类预测值。其中，第一维F为批量内正样本的数量，B为批量内负样本的数量，第二维C为分类的类别数量。数据类型为float32或float64。
+    - **predict_location** (Variable) — 维度为 :math:`[F, 4]` 的2-D Tensor，表示正样本的位置回归预测值。其中，第一维F为批量内正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **target_label** (Variable) — 维度为 :math:`[F + B, 1]` 的2-D Tensor，表示正负样本的分类目标值。其中，第一维F为正样本的数量，B为负样本的数量，第二维1表示每个样本的真值类别只有1类。数据类型为int32。
+    - **target_bbox** (Variable) — 维度为 :math:`[F, 4]` 的2-D Tensor，表示正样本的位置回归目标值。其中，第一维F为正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **bbox_inside_weight** (Variable) — 维度为 :math:`[F, 4]` 的2-D LoDTensor，表示位置回归预测值中是否属于假正样本，若某个正样本为假，则bbox_inside_weight中对应维度的值为0，否则为1。第一维F为正样本的数量，第二维4表示每个样本有4个坐标值。数据类型为float32或float64。
+    - **fg_num** (Variable) — 维度为 :math:`[N, 1]` 的2-D Tensor，表示正样本的数量。其中，第一维N表示批量内的图片数量。 **注意：由于正样本数量会用作后续损失函数的分母，为避免出现除以0的情况，该OP已将每张图片的正样本数量做加1操作** 。数据类型为int32。
 
 
-返回类型：tuple
+返回类型：元组(tuple)，元组中的元素predict_scores，predict_location，target_label，target_bbox，bbox_inside_weight，fg_num都是Variable。
+
 
 **代码示例**
 
 .. code-block:: python
 
     import paddle.fluid as fluid
-    bbox_pred = layers.data(name='bbox_pred', shape=[1, 100, 4],
+    import numpy as np
+ 
+    bbox_pred = fluid.layers.data(name='bbox_pred', shape=[1, 100, 4],
                       append_batch_size=False, dtype='float32')
-    cls_logits = layers.data(name='cls_logits', shape=[1, 100, 10],
+    cls_logits = fluid.layers.data(name='cls_logits', shape=[1, 100, 10],
                       append_batch_size=False, dtype='float32')
-    anchor_box = layers.data(name='anchor_box', shape=[100, 4],
+    anchor_box = fluid.layers.data(name='anchor_box', shape=[100, 4],
                       append_batch_size=False, dtype='float32')
-    anchor_var = layers.data(name='anchor_var', shape=[100, 4],
+    anchor_var = fluid.layers.data(name='anchor_var', shape=[100, 4],
                       append_batch_size=False, dtype='float32')
-    gt_boxes = layers.data(name='gt_boxes', shape=[10, 4],
+    gt_boxes = fluid.layers.data(name='gt_boxes', shape=[10, 4],
                       append_batch_size=False, dtype='float32')
-    gt_labels = layers.data(name='gt_labels', shape=[10, 1],
+    gt_labels = fluid.layers.data(name='gt_labels', shape=[10, 1],
                       append_batch_size=False, dtype='float32')
     is_crowd = fluid.layers.data(name='is_crowd', shape=[1],
                       append_batch_size=False, dtype='float32')
-    im_info = fluid.layers.data(name='im_infoss', shape=[1, 3],
+    im_info = fluid.layers.data(name='im_info', shape=[1, 3],
                       append_batch_size=False, dtype='float32')
     loc_pred, score_pred, loc_target, score_target, bbox_inside_weight, fg_num =
           fluid.layers.retinanet_target_assign(bbox_pred, cls_logits, anchor_box,
           anchor_var, gt_boxes, gt_labels, is_crowd, im_info, 10)
-
-
-
-
-
-
-
-
-
-
diff --git a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
index 90a5f0ac0..aadbe1545 100644
--- a/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
+++ b/doc/fluid/api_cn/layers_cn/sigmoid_focal_loss_cn.rst
@@ -5,35 +5,36 @@ sigmoid_focal_loss
 
 .. py:function:: paddle.fluid.layers.sigmoid_focal_loss(x, label, fg_num, gamma=2, alpha=0.25)
 
-**Sigmoid Focal loss损失计算**
+`Focal Loss <https://arxiv.org/abs/1708.02002>`_ 被提出用于解决计算机视觉任务中前景-背景不平衡的问题。该OP先计算输入x中每个元素的sigmoid值，然后计算sigmoid值与类别目标值label之间的Focal Loss。
 
-focal损失用于解决在one-stage探测器的训练阶段存在的前景 - 背景类不平衡问题。 此运算符计算输入张量中每个元素的sigmoid值，然后计算focal损失。
-
-focal损失计算过程：
+Focal Loss的计算过程如下：
 
 .. math::
 
-  loss_j = (-label_j * alpha * {(1 - \sigma(x_j))}^{gamma} * \log(\sigma(x_j)) -
-  (1 - labels_j) * (1 - alpha) * {(\sigma(x_j)}^{ gamma} * \log(1 - \sigma(x_j)))
-  / fg\_num, j = 1,...,K
+  \mathop{loss_{i,\,j}}\limits_{i\in\mathbb{[0,\,N-1]},\,j\in\mathbb{[0,\,C-1]}}=\left\{
+  \begin{array}{rcl}
+  - \frac{1}{fg\_num} * \alpha * {(1 - \sigma(x_{i,\,j}))}^{\gamma} * \log(\sigma(x_{i,\,j})) & & {(j +1) = label_{i,\,0}}\\
+  - \frac{1}{fg\_num} * (1 - \alpha) * {\sigma(x_{i,\,j})}^{ \gamma} * \log(1 - \sigma(x_{i,\,j})) & & {(j +1)!= label_{i,\,0}}
+  \end{array} \right.
 
 其中，已知：
 
 .. math::
 
-  \sigma(x_j) = \frac{1}{1 + \exp(-x_j)}
+  \sigma(x_{i,\,j}) = \frac{1}{1 + \exp(-x_{i,\,j})}
+
 
 参数：
-    - **x**  (Variable) – 具有形状[N，D]的2-D张量，其中N是batch大小，D是类的数量（不包括背景）。 此输入是由前一个运算符计算出的logits张量。
-    - **label**  (Variable) – 形状为[N，1]的二维张量，是所有可能的标签。
-    - **fg_num**  (Variable) – 具有形状[1]的1-D张量，是前景的数量。
-    - **gamma**  (float) –  用于平衡简单和复杂实例的超参数。 默认值设置为2.0。
-    - **alpha**  (float) – 用于平衡正面和负面实例的超参数。 默认值设置为0.25。
+    - **x**  (Variable) – 维度为 :math:`[N, D]` 的2-D Tensor，表示全部样本的分类预测值。其中，第一维N是批量内参与训练的样本数量，例如在目标检测中，样本为框级别，N为批量内所有图像的正负样本的数量总和；在图像分类中，样本为图像级别，N为批量内的图像数量总和。第二维D是类别数量（ **不包括背景类** ）。数据类型为float32或float64。
+    - **label**  (Variable) – 维度为 :math:`[N, 1]` 的2-D Tensor，表示全部样本的分类目标值。其中，第一维N是批量内参与训练的样本数量，第二维1表示每个样本只有一个类别目标值。正样本的目标类别值的取值范围是 :math:`[1, D]` , 负样本的目标类别值是0。数据类型为int32。
+    - **fg_num**  (Variable) – 维度为 :math:`[1]` 的1-D Tensor，表示批量内正样本的数量，需在进入此OP前获取正样本的数量。数据类型为int32。
+    - **gamma**  (float) –  用于平衡易分样本和难分样本的超参数， 默认值设置为2.0。
+    - **alpha**  (float) – 用于平衡正样本和负样本的超参数，默认值设置为0.25。
 
 
-返回：  具有形状[N，D]的2-D张量，即focal损失。
+返回：  输入x中每个元素的Focal loss，即维度为 :math:`[N, D]` 的2-D Tensor。
 
-返回类型： out(Variable)
+返回类型： 变量（Variable），数据类型为float32或float64。
 
 **代码示例**
 
@@ -53,7 +54,3 @@ focal损失计算过程：
                                            fg_num=fg_num,
                                            gamma=2.,
                                            alpha=0.25)
-
-
-
-
-- 
GitLab