"support sparse_accessor_class: ['DownpourSparseValueAccessor', 'DownpourCtrAccessor', 'DownpourCtrDoubleAccessor', 'DownpourUnitAccessor', 'DownpourDoubleUnitAccessor'], but actual %s"
"'inputs' was not specified when Model initialization, so the input shape to be saved will be the shape derived from the user's actual inputs. The input shape to be saved is %s. For saving correct input shapes, please provide 'inputs' for Model initialization."
By default, this operator implements the cross entropy loss function with softmax. This function
combines the calculation of the softmax operation and the cross entropy loss function
to provide a more numerically stable computing.
...
...
@@ -2399,21 +2402,13 @@ def cross_entropy(
Parameters:
- **input** (Tensor)
Input tensor, the data type is float32, float64. Shape is
:math:`[N_1, N_2, ..., N_k, C]`, where C is number of classes , ``k >= 1`` .
input (Tensor): the data type is float32, float64. Shape is :math:`[N_1, N_2, ..., N_k, C]`, where C is number of classes, ``k >= 1`` .
Note:
1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the
output of softmax operator, which will produce incorrect results.
1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the output of softmax operator, which will produce incorrect results.
2. when use_softmax=False, it expects the output of softmax operator.
- **label** (Tensor)
label (Tensor):
1. If soft_label=False, the shape is
:math:`[N_1, N_2, ..., N_k]` or :math:`[N_1, N_2, ..., N_k, 1]`, k >= 1.
the data type is int32, int64, float32, float64, where each value is [0, C-1].
...
...
@@ -2421,48 +2416,27 @@ def cross_entropy(
2. If soft_label=True, the shape and data type should be same with ``input`` ,
and the sum of the labels for each sample should be 1.
- **weight** (Tensor, optional)
a manual rescaling weight given to each class.
weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size C and the data type is float32, float64.
Default is ``'None'`` .
- **ignore_index** (int64, optional)
Specifies a target value that is ignored
ignore_index (int64, optional): Specifies a target value that is ignored
and does not contribute to the loss. A negative value means that no label
value needs to be ignored. Only valid when soft_label = False.
Default is ``-100`` .
- **reduction** (str, optional)
Indicate how to average the loss by batch_size,
reduction (str, optional): Indicate how to average the loss by batch_size,
the candicates are ``'none'`` | ``'mean'`` | ``'sum'``.
If :attr:`reduction` is ``'mean'``, the reduced mean loss is returned;
If :attr:`size_average` is ``'sum'``, the reduced sum loss is returned.
If :attr:`reduction` is ``'none'``, the unreduced loss is returned.
Default is ``'mean'``.
- **soft_label** (bool, optional)
Indicate whether label is soft.
Default is ``False``.
- **axis** (int, optional)
The index of dimension to perform softmax calculations.
soft_label (bool, optional): Indicate whether label is soft. Default is ``False``.
axis (int, optional):The index of dimension to perform softmax calculations.
It should be in range :math:`[-1, rank - 1]`, where :math:`rank` is the
number of dimensions of input :attr:`input`.
Default is ``-1`` .
- **use_softmax** (bool, optional)
Indicate whether compute softmax before cross_entropy.
use_softmax (bool, optional): Indicate whether compute softmax before cross_entropy.
Default is ``True``.
- **name** (str, optional)
The name of the operator. Default is ``None`` .
name (str, optional): The name of the operator. Default is ``None`` .
For more information, please refer to :ref:`api_guide_Name` .
Returns:
...
...
@@ -2478,9 +2452,7 @@ def cross_entropy(
2. if soft_label = True, the dimension of return value is :math:`[N_1, N_2, ..., N_k, 1]` .
@@ -139,6 +140,7 @@ class BCEWithLogitsLoss(Layer):
classCrossEntropyLoss(Layer):
r"""
By default, this operator implements the cross entropy loss function with softmax. This function
combines the calculation of the softmax operation and the cross entropy loss function
to provide a more numerically stable computing.
...
...
@@ -251,60 +253,35 @@ class CrossEntropyLoss(Layer):
Parameters:
- **weight** (Tensor, optional)
a manual rescaling weight given to each class.
weight (Tensor, optional): a manual rescaling weight given to each class.
If given, has to be a Tensor of size C and the data type is float32, float64.
Default is ``'None'`` .
- **ignore_index** (int64, optional)
Specifies a target value that is ignored
ignore_index (int64, optional): Specifies a target value that is ignored
and does not contribute to the loss. A negative value means that no label
value needs to be ignored. Only valid when soft_label = False.
Default is ``-100`` .
- **reduction** (str, optional)
Indicate how to average the loss by batch_size,
reduction (str, optional): Indicate how to average the loss by batch_size,
the candicates are ``'none'`` | ``'mean'`` | ``'sum'``.
If :attr:`reduction` is ``'mean'``, the reduced mean loss is returned;
If :attr:`size_average` is ``'sum'``, the reduced sum loss is returned.
If :attr:`reduction` is ``'none'``, the unreduced loss is returned.
Default is ``'mean'``.
- **soft_label** (bool, optional)
Indicate whether label is soft.
soft_label (bool, optional): Indicate whether label is soft.
If soft_label=False, the label is hard. If soft_label=True, the label is soft.
Default is ``False``.
- **axis** (int, optional)
The index of dimension to perform softmax calculations.
axis (int, optional): The index of dimension to perform softmax calculations.
It should be in range :math:`[-1, rank - 1]`, where :math:`rank` is the number
of dimensions of input :attr:`input`.
Default is ``-1`` .
- **use_softmax** (bool, optional)
Indicate whether compute softmax before cross_entropy.
use_softmax (bool, optional): Indicate whether compute softmax before cross_entropy.
Default is ``True``.
- **name** (str, optional)
The name of the operator. Default is ``None`` .
name (str, optional): The name of the operator. Default is ``None`` .
For more information, please refer to :ref:`api_guide_Name` .
Shape:
- **input** (Tensor)
Input tensor, the data type is float32, float64. Shape is
- **input** (Tensor), the data type is float32, float64. Shape is
:math:`[N_1, N_2, ..., N_k, C]`, where C is number of classes , ``k >= 1`` .
Note:
1. when use_softmax=True, it expects unscaled logits. This operator should not be used with the
...
...
@@ -312,7 +289,6 @@ class CrossEntropyLoss(Layer):
2. when use_softmax=False, it expects the output of softmax operator.
- **label** (Tensor)
1. If soft_label=False, the shape is
...
...
@@ -322,14 +298,9 @@ class CrossEntropyLoss(Layer):
2. If soft_label=True, the shape and data type should be same with ``input`` ,
and the sum of the labels for each sample should be 1.
- **output** (Tensor)
Return the softmax cross_entropy loss of ``input`` and ``label``.
- **output** (Tensor), Return the softmax cross_entropy loss of ``input`` and ``label``.
The data type is the same as input.
If :attr:`reduction` is ``'mean'`` or ``'sum'`` , the dimension of return value is ``1``.
If :attr:`reduction` is ``'none'``:
1. If soft_label = False, the dimension of return value is the same with ``label`` .
...
...
@@ -634,6 +605,7 @@ class MSELoss(Layer):
classL1Loss(Layer):
r"""
Construct a callable object of the ``L1Loss`` class.
The L1Loss layer calculates the L1 Loss of ``input`` and ``label`` as follows.
...
...
@@ -663,10 +635,10 @@ class L1Loss(Layer):
name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
Shape:
input (Tensor): The input tensor. The shapes is [N, *], where N is batch size and `*` means any number of additional dimensions. It's data type should be float32, float64, int32, int64.
label (Tensor): label. The shapes is [N, *], same shape as ``input`` . It's data type should be float32, float64, int32, int64.
output (Tensor): The L1 Loss of ``input`` and ``label``.
If `reduction` is ``'none'``, the shape of output loss is [N, *], the same as ``input`` .
- input (Tensor): The input tensor. The shapes is ``[N, *]``, where N is batch size and `*` means any number of additional dimensions. It's data type should be float32, float64, int32, int64.
- label (Tensor): label. The shapes is ``[N, *]``, same shape as ``input`` . It's data type should be float32, float64, int32, int64.
- output (Tensor): The L1 Loss of ``input`` and ``label``.
If `reduction` is ``'none'``, the shape of output loss is ``[N, *]``, the same as ``input`` .
If `reduction` is ``'mean'`` or ``'sum'``, the shape of output loss is [1].
Examples:
...
...
@@ -692,6 +664,7 @@ class L1Loss(Layer):
print(output)
# [[0.20000005 0.19999999]
# [0.2 0.79999995]]
"""
def__init__(self,reduction='mean',name=None):
...
...
@@ -712,6 +685,7 @@ class L1Loss(Layer):
classBCELoss(Layer):
"""
This interface is used to construct a callable object of the ``BCELoss`` class.
The BCELoss layer measures the binary_cross_entropy loss between input predictions ``input``
and target labels ``label`` . The binary_cross_entropy loss can be described as:
...
...
@@ -755,13 +729,13 @@ class BCELoss(Layer):
For more information, please refer to :ref:`api_guide_Name`.
Shape:
input (Tensor): 2-D tensor with shape: [N, *], N is batch_size, `*` means
- input (Tensor): 2-D tensor with shape: ``[N, *]``, N is batch_size, `*` means
number of additional dimensions. The input ``input`` should always
be the output of sigmod. Available dtype is float32, float64.
label (Tensor): 2-D tensor with the same shape as ``input``. The target
- label (Tensor): 2-D tensor with the same shape as ``input``. The target
labels which values should be numbers between 0 and 1. Available
dtype is float32, float64.
output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
- output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is scalar.
Returns:
...
...
@@ -914,6 +888,7 @@ class NLLLoss(Layer):
classKLDivLoss(Layer):
r"""
Generate a callable object of 'KLDivLoss' to calculate the
Kullback-Leibler divergence loss between Input(X) and
Input(Target). Notes that Input(X) is the log-probability
...
...
@@ -933,14 +908,10 @@ class KLDivLoss(Layer):
Default is ``'mean'``.
Shape:
- input (Tensor): (N, *), where * means, any number of additional dimensions.
- label (Tensor): (N, *), same shape as input.
- input (Tensor): ``(N, *)``, where ``*`` means, any number of additional dimensions.
- label (Tensor): ``(N, *)``, same shape as input.
- output (Tensor): tensor with shape: [1] by default.
Examples:
.. code-block:: python
...
...
@@ -970,6 +941,7 @@ class KLDivLoss(Layer):
kldiv_criterion = nn.KLDivLoss(reduction='none')
pred_loss = kldiv_criterion(x, target)
# shape=[5, 20]
"""
def__init__(self,reduction='mean'):
...
...
@@ -1720,6 +1692,7 @@ class TripletMarginLoss(Layer):
classSoftMarginLoss(Layer):
r"""
Creates a criterion that measures a two-class soft margin loss between input predictions ``input``
and target labels ``label`` . It can be described as:
...
...
@@ -1738,16 +1711,13 @@ class SoftMarginLoss(Layer):
name (str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
Shapes:
Input (Tensor): The input tensor with shape: [N, *],
- Input (Tensor): The input tensor with shape: ``[N, *]``,
N is batch_size, `*` means any number of additional dimensions. The ``input`` ranges from -inf to inf
Available dtype is float32, float64.
Label (Tensor): The target labels tensor with the same shape as
- Label (Tensor): The target labels tensor with the same shape as
``input``. The target labels which values should be numbers -1 or 1.
Available dtype is int32, int64, float32, float64.
Output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
- Output (Tensor): If ``reduction`` is ``'none'``, the shape of output is
same as ``input`` , else the shape of output is [1].
Returns:
...
...
@@ -1780,6 +1750,7 @@ class SoftMarginLoss(Layer):
@@ -1661,6 +1661,7 @@ class MultiplicativeDecay(LRScheduler):
classOneCycleLR(LRScheduler):
r"""
Sets the learning rate according to the one cycle learning rate scheduler.
The scheduler adjusts the learning rate from an initial learning rate to the maximum learning rate and then
from that maximum learning rate to the minimum learning rate, which is much less than the initial learning rate.
...
...
@@ -1674,22 +1675,25 @@ class OneCycleLR(LRScheduler):
Also note that you should update learning rate each step.
Args:
max_learning_rate (float): The maximum learning rate. It is a python float number.
Functionally, it defines the initial learning rate by ``divide_factor`` .
max_learning_rate (float): The maximum learning rate. It is a python float number. Functionally, it defines the initial learning rate by ``divide_factor`` .
total_steps (int): Number of total training steps.
divide_factor (float): Initial learning rate will be determined by initial_learning_rate = max_learning_rate / divide_factor. Default: 25.
divide_factor (float, optional): Initial learning rate will be determined by initial_learning_rate = max_learning_rate / divide_factor. Default: 25.
end_learning_rate (float, optional): The minimum learning rate during training, it should be much less than initial learning rate.
phase_pct (float): The percentage of total steps which used to increasing learning rate. Default: 0.3.
anneal_strategy (str, optional): Strategy of adjusting learning rate.'cos' for cosine annealing,
'linear' for linear annealing. Default: 'cos'.
anneal_strategy (str, optional): Strategy of adjusting learning rate.'cos' for cosine annealing, 'linear' for linear annealing. Default: 'cos'.
three_phase (bool, optional): Whether to use three phase.
If ``True``:
1. The learning rate will first increase from initial learning rate to maximum learning rate.
2. Then it will decrease to initial learning rate. Number of step in this phase is the same as the one in first phase.
3. Finally, it will decrease to minimum learning rate which is much less than initial learning rate.
If ``False``:
1. The learning rate will increase to maximum learning rate.
2. Then it will directly decrease to minimum learning rate.
last_epoch (int, optional): The index of last epoch. Can be set to restart training. Default: -1, means initial learning rate.
verbose (bool, optional): If ``True``, prints a message to stdout for each update. Default: ``False`` .
...
...
@@ -1741,6 +1745,7 @@ class OneCycleLR(LRScheduler):
},
fetch_list=loss.name)
scheduler.step() # You should update learning rate each step
- Singular labels are called free labels, duplicate are dummy labels. Dummy labeled
dimensions will be reduced and removed in the output.
- Output labels can be explicitly specified on the right hand side of `->` or omitted.
In the latter case, the output labels will be inferred from the input labels.
- Output labels can be explicitly specified on the right hand side of `->` or omitted. In the latter case, the output labels will be inferred from the input labels.
- Inference of output labels
- Broadcasting label `...`, if present, is put on the leftmost position.
- Free labels are reordered alphabetically and put after `...`.