Differentiate cost layer with other layers
Created by: emailweixu
Currently, there is no way to tell whether a layer is a cost (loss) layer or not. However, there is a crucial difference between cost layer and non-cost layers. During backpropagation, cost layer does not need gradient from its output. The gradient of its output is implicitly assumed to be 1's. There are several benefits of adding a mechanism to differentiate cost layer with other layers
- Prevent the incorrect use of the output of a cost layer as the input of other layers
- When there are multiple outputs of a model including both cost layer and non-cost layers, when calculating cost (using Argument::sumCost), the trainer should only sum over the cost layers, excluding the non-cost layers, so that it can show the correct cost during training.