In programming languages, the control flow determines the order in which statements are executed. Common control flows contain sequential execution, branching, and looping. PaddlePaddle Fluid inherits this concept and provides a variety of control flow APIs to control the execution logic of the deep learning model during training or prediction.
IfElse
======
Conditional branch, for the input of a batch, according to the given conditions, select the process in :code:`true_block` or :code:`false_block` to execute respectively, and then merge the outputs of the two branches into one after the execution. In general, conditional expressions can be generated by a logical comparison API such as :ref:`api_fluid_layers_less_than`, :ref:`api_fluid_layers_equal`.
Please refer to :ref:`api_fluid_layers_IfElse`
Switch
======
Switch, like the :code:`switch-case` declaration commonly found in programming languages, selects different branch to execute depending on the value of the input expression. Specifically, the :code:`Switch` control flow defined by Fluid has the following characteristics:
* The condition of the case is a bool type value, which is a tensor type Variable in the Program;
* It checks each case one by one, selects the first case that satisfies the condition, and exits the block after completion of the execution;
* If all cases do not meet the conditions, the default case will be selected for execution.
Please refer to :ref:`api_fluid_layers_Switch`
While
=====
When the condition is true, repeatedly execute logic in the :code:`block` which :code:`While` flow belongs to until the condition is judged to be false and the loop will be ended. The related APIs are as follows:
* :ref:`api_fluid_layers_increment` : It is usually used to count the number of loops;
* :ref:`api_fluid_layers_array_read` : Reads Variable from the specified location in :code:`LOD_TENSOR_ARRAY` to perform calculations;
* :ref:`api_fluid_layers_array_write` : Writes the Variable back to the specified location in :code:`LOD_TENSOR_ARRAY` and stores the result of the calculation.
Please refer to :ref:`api_fluid_layers_While`
DynamicRNN
==========
Dynamic RNN can process a batch of unequal(variable)-length sequence data, which accepts the variable with :code:`lod_level=1` as input. In the :code:`block` of :code:`DynamicRNN`, the user needs to customize RNN's single-step calculation logic. At each time step, the user can write the state to be remembered to the :code:`memory` of :code:`DynamicRNN` and write the required output to its :code:`output`.
:ref:`api_fluid_layers_sequence_last_step` gets the output of the last time step of :code:`DynamicRNN`.
Please refer to :ref:`api_fluid_layers_DynamicRNN`
StaticRNN
=========
Static RNN can only process fixed-length sequence data, and accept Variable with :code:`lod_level=0` as input. Similar to :code:`DynamicRNN`, at each single time step of the RNN, the user needs to customize the calculation logic and export the status and output.
The loss function defines the difference between the inference result and the ground-truth result. As the optimization target, it directly determines whether the model training is good or not, and many researches also focus on the optimization of the loss function design.
Paddle Fluid offers diverse types of loss functions for a variety of tasks. Let's take a look at the commonly-used loss functions included in Paddle Fluid.
Regression
===========
The squared error loss uses the square of the error between the predicted value and the ground-truth value as the sample loss, which is the most basic loss function in the regression problems.
For API Reference, please refer to :ref:`api_fluid_layers_square_error_cost`.
Smooth L1 loss (smooth_l1 loss) is a piecewise loss function that is relatively insensitive to outliers and therefore more robust.
For API Reference, please refer to :ref:`api_fluid_layers_smooth_l1`.
Classification
================
`cross entropy <https://en.wikipedia.org/wiki/Cross_entropy>`_ is the most widely used loss function in classification problems. The interfaces in Paddle Fluid for the cross entropy loss functions are divided into the one accepting fractional input of normalized probability values and another for non-normalized input. And Fluid supports two types labels, namely soft label and hard label.
For API Reference, please refer to :ref:`api_fluid_layers_cross_entropy` and :ref:`api_fluid_layers_softmax_with_cross_entropy`.
Multi-label classification
----------------------------
For the multi-label classification, such as the occasion that an article belongs to multiple categories like politics, technology, it is necessary to calculate the loss by treating each category as an independent binary-classification problem. We provide the sigmoid_cross_entropy_with_logits loss function for this purpose.
For API Reference, please refer to :ref:`api_fluid_layers_sigmoid_cross_entropy_with_logits`.
Large-scale classification
-----------------------------
For large-scale classification problems, special methods and corresponding loss functions are usually needed to speed up the training. The commonly used methods are
`Noise contrastive estimation (NCE) <http://proceedings.mlr.press/v9/gutmann10a/gutmann10a.pdf>`_ and `Hierarchical sigmoid <http://www.iro.umontreal.ca/~lisa/pointeurs/hierarchical-nnlm-aistats05.pdf>`_ .
* NCE solves the binary-classification problem of discriminating the true distribution and the noise distribution by converting the multi-classification problem into a classifier. The maximum likelihood estimation is performed based on the binary-classification to avoid calculating the normalization factor in the full-class space to reduce computational complexity.
* Hierarchical sigmoid realizes multi-classification by hierarchical classification of binary trees. The loss of each sample corresponds to the sum of the cross-entropy of the binary-classification for each node on the coding path, which avoids the calculation of the normalization factor and reduces the computational complexity.
The loss functions for both methods are available in Paddle Fluid. For API Reference please refer to :ref:`api_fluid_layers_nce` and :ref:`api_fluid_layers_hsigmoid`.
Sequence classification
-------------------------
Sequence classification can be divided into the following three types:
* Sequence Classification problem is that the entire sequence corresponds to a prediction label, such as text classification. This is a common classification problem, you can use cross entropy as the loss function.
* Segment Classification problem is that each segment in the sequence corresponds to its own category tag, such as named entity recognition. For this sequence labeling problem, `the (Linear Chain) Conditional Random Field (CRF) <http://www.cs.columbia.edu/~mcollins/fb.pdf>`_ is a commonly used model. The method uses the likelihood probability on the sentence level, and the labels for different positions in the sequence are no longer conditionally independent, which can effectively solve the label offset problem. Support for CRF loss functions is available in Paddle Fluid. For API Reference please refer to :ref:`api_fluid_layers_linear_chain_crf` .
* Temporal Classification problem needs to label unsegmented sequences, such as speech recognition. For this time-based classification problem, `CTC(Connectionist Temporal Classification) <http://people.idsia.ch/~santiago/papers/icml2006.pdf>`_ loss function does not need to align input data and labels, and is able to perform end-to-end training. Paddle Fluid provides a warpctc interface to calculate the corresponding loss. For API Reference, please refer to :ref:`api_fluid_layers_warpctc` .
Rank
=========
`Rank problems <https://en.wikipedia.org/wiki/Learning_to_rank>`_ can use learning methods of Pointwise, Pairwise, and Listwise. Different methods require different loss functions:
* The Pointwise method solves the ranking problem by approximating the regression problem. Therefore the loss function of the regression problem can be used.
* Pairwise's method requires a special loss function. Pairwise solves the sorting problem by approximating the classification problem, using relevance score of two documents and the query to use the partial order as the binary-classification label to calculate the loss. Paddle Fluid provides two commonly used loss functions for Pairwise methods. For API Reference please refer to :ref:`api_fluid_layers_rank_loss` and :ref:`api_fluid_layers_margin_rank_loss`.
More
====
For more complex loss functions, try to use combinations of other loss functions; the :ref:`api_fluid_layers_dice_loss` provided in Paddle Fluid for image segmentation tasks is an example of using combinations of other operators (calculate the average likelihood probability of each pixel position). The multi-objective loss function can also be considered similarly, such as Faster RCNN that uses the weighted sum of cross entropy and smooth_l1 loss as a loss function.
**Note**, after defining the loss function, in order to optimize with :ref:`api_guide_optimizer_en`, you usually need to use :ref:`api_fluid_layers_mean` or other operations to convert the high-dimensional Tensor returned by the loss function to a Scalar value.
During or after the training of the neural network, it is necessary to evaluate the training effect of the model. The method of evaluation generally is calculating the distance between the overall predicted value and the overall label. Different types of tasks are applied with different evaluation methods, or with a combination of evaluation methods. In a specific task, one or more evaluation methods can be selected. Now let's take a look at commonly used evaluation methods grouped by the type of task.
Classification task evaluation
-------------------------------
The most common classification task is the binary classification task, and the multi-classification task can also be transformed into a combination of multiple binary classification tasks. The metrics commonly adopted in the two-category tasks are accuracy, correctness, recall rate, AUC and average accuracy.
- :code:`Precision` , which is used to measure the proportion of recalled ground-truth values in recalled values in binary classification.
For API Reference, please refer to :ref:`api_fluid_metrics_Precision`
- :code:`Accuracy`, which is used to measure the proportion of the recalled ground-truth value in the total number of samples in binary classification. It should be noted that the definitions of precision and accuracy are different and can be analogized to :code:`Variance` and :code:`Bias` in error analysis.
For API Reference, please refer to :ref:`api_fluid_metrics_Accuracy`
- :code:`Recall`, which is used to measure the ratio of the recalled values to the total number of samples in binary classification. The choice of accuracy and recall rate is mutually constrained, and trade-offs are needed in the actual model. Refer to the documentation `Precision_and_recall <https://en.wikipedia.org/wiki/Precision_and_recall>`_ .
For API Reference, please refer to :ref:`api_fluid_metrics_Recall`
- :code:`Area Under Curve`, a classification model for binary classification, used to calculate the cumulative area of the `ROC curve <https://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve>`_ . :code:`Auc` is implemented via python. If you are concerned about performance, you can use :code:`fluid.layers.auc` instead.
For API Reference, please refer to :ref:`api_fluid_metrics_Auc`
- :code:`Average Precision`, commonly used in object detection tasks such as Faster R-CNN and SSD. The average precision is calculated under different recall conditions. For details, please refer to the document `Average precision <https://sanchom.wordpress.com/tag/average-precision/>`_ and `SSD Single Shot MultiBox Detector <https://arxiv.org/abs/1512.02325>`_ .
For API Reference, please refer to :ref:`api_fluid_metrics_DetectionMAP`
Sequence labeling task evaluation
----------------------------------
In the sequence labeling task, the group of tokens is called a chunk, and the model will group and classify the input tokens at the same time. The commonly used evaluation method is the chunk evaluation method.
- The chunk evaluation method :code:`ChunkEvaluator` receives the output of the :code:`chunk_eval` interface, and accumulates the statistics of chunks in each mini-batch , and finally calculates the accuracy, recall and F1 values. :code:`ChunkEvaluator` supports four labeling modes: IOB, IOE, IOBES and IO. You can refer to the documentation `Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>`_.
For API Reference, please refer to :ref:`api_fluid_metrics_ChunkEvaluator`
Generation/Synthesis task evaluation
----------------------------
The generation task produces output directly from the input. In NLP tasks (such as speech recognition), a new string is generated. There are several ways to evaluate the distance between a generated string and a target string, such as a multi-classification evaluation method, and another commonly used method is called editing distance.
- Edit distance: :code:`EditDistance` to measure the similarity of two strings. You can refer to the documentation `Edit_distance <https://en.wikipedia.org/wiki/Edit_distance>`_.
For API Reference, please refer to :ref:`api_fluid_metrics_EditDistance`