提交 8bf37994 编写于 作者: D dangqingqing

Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into lstm_fix

# Design: Sequence Decoder Generating LoDTensors
In tasks such as machine translation and image to text,
a [sequence decoder](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md) is necessary to generate sequences.
This documentation describes how to implement the sequence decoder as an operator.
## Beam Search based Decoder
The [beam search algorithm](https://en.wikipedia.org/wiki/Beam_search) is necessary when generating sequences,
it is a heuristic search algorithm that explores the paths by expanding the most promising node in a limited set.
In the old version of PaddlePaddle, a C++ class `RecurrentGradientMachine` implements the general sequence decoder based on beam search,
due to the complexity, the implementation relays on a lot of special data structures,
quite trivial and hard to be customized by users.
There are a lot of heuristic tricks in the sequence generation tasks,
so the flexibility of sequence decoder is very important to users.
During PaddlePaddle's refactoring work,
some new concept is proposed such as [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md) and [TensorArray](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/design/tensor_array.md) that can better support sequence usage,
and they can help to make the implementation of beam search based sequence decoder **more transparent and modular** .
For example, the RNN sates, candidates IDs and probabilities of beam search can be represented as `LoDTensors`;
the selected candidate's IDs in each time step can be stored in a `TensorArray`, and `Packed` to the sentences translated.
## Changing LoD's absolute offset to relative offsets
The current `LoDTensor` is designed to store levels of variable-length sequences,
it stores several arrays of integers each represents a level.
The integers in each level represents the begin and end (not inclusive) offset of a sequence **in the underlying tensor**,
let's call this format the **absolute-offset LoD** for clear.
The relative-offset LoD can fast retrieve any sequence but fails to represent empty sequences, for example, a two-level LoD is as follows
```python
[[0, 3, 9]
[0, 2, 3, 3, 3, 9]]
```
The first level tells that there are two sequences:
- the first's offset is `[0, 3)`
- the second's offset is `[3, 9)`
while on the second level, there are several empty sequences that both begin and end at `3`.
It is impossible to tell how many empty second-level sequences exist in the first-level sequences.
There are many scenarios that relay on empty sequence representation,
such as machine translation or image to text, one instance has no translations or the empty candidate set for a prefix.
So let's introduce another format of LoD,
it stores **the offsets of the lower level sequences** and is called **relative-offset** LoD.
For example, to represent the same sequences of the above data
```python
[[0, 3, 6]
[0, 2, 3, 3, 3, 9]]
```
the first level represents that there are two sequences,
their offsets in the second-level LoD is `[0, 3)` and `[3, 5)`.
The second level is the same with the relative offset example because the lower level is a tensor.
It is easy to find out the second sequence in the first-level LoD has two empty sequences.
The following demos are based on relative-offset LoD.
## Usage in a simple machine translation model
Let's start from a simple machine translation model that is simplified from [machine translation chapter](https://github.com/PaddlePaddle/book/tree/develop/08.machine_translation) to draw a simple blueprint of what a sequence decoder can do and how to use it.
The model has an encoder that learns the semantic vector from a sequence,
and a decoder which uses the sequence decoder to generate new sentences.
**Encoder**
```python
import paddle as pd
dict_size = 8000
source_dict_size = dict_size
target_dict_size = dict_size
word_vector_dim = 128
encoder_dim = 128
decoder_dim = 128
beam_size = 5
max_length = 120
# encoder
src_word_id = pd.data(
name='source_language_word',
type=pd.data.integer_value_sequence(source_dict_dim))
src_embedding = pd.embedding(size=source_dict_size, size=word_vector_dim)
src_word_vec = pd.lookup(src_embedding, src_word_id)
encoder_out_seq = pd.gru(input=src_word_vec, size=encoder_dim)
encoder_ctx = pd.last_seq(encoder_out_seq)
# encoder_ctx_proj is the learned semantic vector
encoder_ctx_proj = pd.fc(
encoder_ctx, size=decoder_dim, act=pd.activation.Tanh(), bias=None)
```
**Decoder**
```python
def generate():
decoder = pd.while_loop()
with decoder.step():
decoder_mem = decoder.memory(init=encoder_ctx) # mark the memory
generated_ids = decoder.memory() # TODO init to batch_size <s>s
generated_scores = decoder.memory() # TODO init to batch_size 1s or 0s
target_word = pd.lookup(trg_embedding, gendrated_ids)
# expand encoder_ctx's batch to fit target_word's lod
# for example
# decoder_mem.lod is
# [[0 1 3],
# [0 1 3 6]]
# its tensor content is [a1 a2 a3 a4 a5]
# which means there are 2 sentences to translate
# - the first sentence has 1 translation prefixes, the offsets are [0, 1)
# - the second sentence has 2 translation prefixes, the offsets are [1, 3) and [3, 6)
# the target_word.lod is
# [[0, 1, 6]
# [0, 2, 4, 7, 9 12]]
# which means 2 sentences to translate, each has 1 and 5 prefixes
# the first prefix has 2 candidates
# the following has 2, 3, 2, 3 candidates
# the encoder_ctx_expanded's content will be
# [a1 a1 a2 a2 a3 a3 a3 a4 a4 a5 a5 a5]
encoder_ctx_expanded = pd.lod_expand(encoder_ctx, target_word)
decoder_input = pd.fc(
act=pd.activation.Linear(),
input=[target_word, encoder_ctx],
size=3 * decoder_dim)
gru_out, cur_mem = pd.gru_step(
decoder_input, mem=decoder_mem, size=decoder_dim)
scores = pd.fc(
gru_out,
size=trg_dic_size,
bias=None,
act=pd.activation.Softmax())
# K is an config
topk_scores, topk_ids = pd.top_k(scores, K)
topk_generated_scores = pd.add_scalar(topk_scores, generated_scores)
selected_ids, selected_generation_scores = decoder.beam_search(
topk_ids, topk_generated_scores)
# update the states
decoder_mem.update(cur_mem) # tells how to update state
generated_ids.update(selected_ids)
generated_scores.update(selected_generation_scores)
decoder.output(selected_ids)
decoder.output(selected_generation_scores)
translation_ids, translation_scores = decoder()
```
The `decoder.beam_search` is a operator that given the candidates and the scores of translations including the candidates,
return the result of the beam search algorithm.
In this way, users can customize anything on the inputs or outputs of beam search, for example, two ways to prune some translation prefixes
1. meke the correspondind elements in `topk_generated_scores` zero or some small values, beam_search will discard this candidate.
2. remove some specific candidate in `selected_ids`
3. get the final `translation_ids`, remove the translation sequence in it.
The implementation of sequence decoder can reuse the C++ class [RNNAlgorithm](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/paddle/operators/dynamic_recurrent_op.h#L30),
so the python syntax is quite similar to a [RNN](https://github.com/Superjom/Paddle/blob/68cac3c0f8451fe62a4cdf156747d6dc0ee000b3/doc/design/block.md#blocks-with-for-and-rnnop).
Both of them are two-level `LoDTensors`
- the first level represents `batch_size` of (source) sentences;
- the second level represents the candidate ID sets for translation prefix.
for example, 3 source sentences to translate, and has 2, 3, 1 candidates.
Unlike an RNN, in sequence decoder, the previous state and the current state have different LoD and shape,
a `lod_expand` operator is used to expand the LoD of the previous state to fit the current state.
For example, the previous state
* LoD is `[0, 1, 3][0, 2, 5, 6]`
* content of tensor is `a1 a2 b1 b2 b3 c1`
the current state stored in `encoder_ctx_expanded`
* LoD is `[0, 2, 7][0 3 5 8 9 11 11]`
* the content is
- a1 a1 a1 (a1 has 3 candidates, so the state should be copied 3 times for each candidates)
- a2 a2
- b1 b1 b1
- b2
- b3 b3
- None (c1 has 0 candidates, so c1 is dropped)
Benefit from the relative offset LoD, empty candidate set can be represented naturally.
the status in each time step can be stored in `TensorArray`, and `Pack`ed to a final LoDTensor, the corresponding syntax is
```python
decoder.output(selected_ids)
decoder.output(selected_generation_scores)
```
the `selected_ids` is the candidate ids for the prefixes,
it will be `Packed` by `TensorArray` to a two-level `LoDTensor`,
the first level represents the source sequences,
the second level represents generated sequences.
Pack the `selected_scores` will get a `LoDTensor` that stores scores of each candidate of translations.
Pack the `selected_generation_scores` will get a `LoDTensor`, and each tail is the probability of the translation.
## LoD and shape changes during decoding
<p align="center">
<img src="./images/LOD-and-shape-changes-during-decoding.jpg"/>
</p>
According the image above, the only phrase to change LoD is beam search.
## Beam search design
The beam search algorthm will be implemented as one method of the sequence decoder, it has 3 inputs
1. `topk_ids`, top K candidate ids for each prefix.
2. `topk_scores`, the corresponding scores for `topk_ids`
3. `generated_scores`, the score of the prefixes.
All of the are LoDTensors, so that the sequence affilication is clear.
Beam search will keep a beam for each prefix and select a smaller candidate set for each prefix.
It will return three variables
1. `selected_ids`, the final candidate beam search function selected for the next step.
2. `selected_scores`, the scores for the candidates.
3. `generated_scores`, the updated scores for each prefixes (with the new candidates appended).
## Introducing the LoD-based `Pack` and `Unpack` methods in `TensorArray`
The `selected_ids`, `selected_scores` and `generated_scores` are LoDTensors,
and they exist in each time step,
so it is natural to store them in arrays.
Currently, PaddlePaddle has a module called `TensorArray` which can store an array of tensors,
the results of beam search are better to store in a `TensorArray`.
The `Pack` and `UnPack` in `TensorArray` are used to package tensors in the array to a `LoDTensor` or split the `LoDTensor` to an array of tensors.
It needs some extensions to support pack or unpack an array of `LoDTensors`.
...@@ -49,7 +49,7 @@ void ScaleSubRegionLayer::forward(PassType passType) { ...@@ -49,7 +49,7 @@ void ScaleSubRegionLayer::forward(PassType passType) {
shape_ = TensorShape({batchSize, channelsNum_, imgH_, imgW_}); shape_ = TensorShape({batchSize, channelsNum_, imgH_, imgW_});
resetOutput(batchSize, imgV->getWidth()); resetOutput(batchSize, imgV->getWidth());
auto out = getOutput(); auto& out = getOutput();
out.setFrameHeight(imgH_); out.setFrameHeight(imgH_);
out.setFrameWidth(imgW_); out.setFrameWidth(imgW_);
......
...@@ -53,7 +53,7 @@ TEST(Operator, dot_mul) { ...@@ -53,7 +53,7 @@ TEST(Operator, dot_mul) {
TEST(Projection, context) { TEST(Projection, context) {
for (auto contextStart : {-5, -3, -1, 0, 3}) { for (auto contextStart : {-5, -3, -1, 0, 3}) {
for (auto contextLength : {1, 2, 5, 7}) { for (auto contextLength : {1, 2, 5, 7}) {
for (auto batchSize : {1, 2, 5, 20, 50}) { for (auto batchSize : {1, 2, 5, 20}) {
for (auto trainablePadding : {false, true}) { for (auto trainablePadding : {false, true}) {
LOG(INFO) << " contextStart=" << contextStart LOG(INFO) << " contextStart=" << contextStart
<< " contextLength=" << contextLength << " contextLength=" << contextLength
...@@ -585,14 +585,14 @@ TEST(Layer, maxoutLayer) { ...@@ -585,14 +585,14 @@ TEST(Layer, maxoutLayer) {
} }
void testFcLayer(string format, size_t nnz) { void testFcLayer(string format, size_t nnz) {
TestConfig config; TestConfig config;
config.biasSize = 4096; config.biasSize = 1024;
config.layerConfig.set_type("fc"); config.layerConfig.set_type("fc");
config.layerConfig.set_size(4096); config.layerConfig.set_size(1024);
config.layerConfig.set_active_type("sigmoid"); config.layerConfig.set_active_type("sigmoid");
config.layerConfig.set_drop_rate(0.1); config.layerConfig.set_drop_rate(0.1);
config.inputDefs.push_back( config.inputDefs.push_back(
{INPUT_DATA, "layer_0", 8192, nnz, ParaSparse(format)}); {INPUT_DATA, "layer_0", 2048, nnz, ParaSparse(format)});
config.layerConfig.add_inputs(); config.layerConfig.add_inputs();
LOG(INFO) << config.inputDefs[0].sparse.sparse << " " LOG(INFO) << config.inputDefs[0].sparse.sparse << " "
...@@ -609,9 +609,9 @@ void testFcLayer(string format, size_t nnz) { ...@@ -609,9 +609,9 @@ void testFcLayer(string format, size_t nnz) {
} }
TEST(Layer, fcLayer) { TEST(Layer, fcLayer) {
testFcLayer("", 4096 * 4096 * 2); testFcLayer("", 1024 * 1024 * 2);
testFcLayer("csc", 4096 * 40); testFcLayer("csc", 1024 * 10);
testFcLayer("csr", 4096 * 40); testFcLayer("csr", 1024 * 10);
} }
TEST(Layer, SelectiveFullyConnectedLayer) { TEST(Layer, SelectiveFullyConnectedLayer) {
...@@ -1995,7 +1995,7 @@ TEST(Layer, multibox_loss) { ...@@ -1995,7 +1995,7 @@ TEST(Layer, multibox_loss) {
TEST(Layer, TransLayer) { TEST(Layer, TransLayer) {
TestConfig config; TestConfig config;
const int height = 128; const int height = 128;
const int width = 1028; const int width = 256;
config.layerConfig.set_type("trans"); config.layerConfig.set_type("trans");
config.layerConfig.set_size(width); config.layerConfig.set_size(width);
......
...@@ -789,10 +789,9 @@ class MixedLayerType(LayerOutput): ...@@ -789,10 +789,9 @@ class MixedLayerType(LayerOutput):
:type size: int :type size: int
:param act: Activation type. :param act: Activation type.
:type act: BaseActivation :type act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer Attribute. :param layer_attr: Extra Layer Attribute.
:type layer_attr: ExtraLayerAttribute or None :type layer_attr: ExtraLayerAttribute or None
...@@ -889,10 +888,9 @@ def mixed_layer(size=0, ...@@ -889,10 +888,9 @@ def mixed_layer(size=0,
then this function will just return layer's name. then this function will just return layer's name.
:param act: Activation Type. LinearActivation is the default. :param act: Activation Type. LinearActivation is the default.
:type act: BaseActivation :type act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: The extra layer config. Default is None. :param layer_attr: The extra layer config. Default is None.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
...@@ -1034,10 +1032,9 @@ def fc_layer(input, ...@@ -1034,10 +1032,9 @@ def fc_layer(input,
:type act: BaseActivation :type act: BaseActivation
:param param_attr: The Parameter Attribute|list. :param param_attr: The Parameter Attribute|list.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer config. :param layer_attr: Extra Layer config.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
...@@ -1390,10 +1387,9 @@ def pooling_layer(input, ...@@ -1390,10 +1387,9 @@ def pooling_layer(input,
:type pooling_type: BasePoolingType | None :type pooling_type: BasePoolingType | None
:param stride: The step size between successive pooling regions. :param stride: The step size between successive pooling regions.
:type stride: Int :type stride: Int
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: The Extra Attributes for layer, such as dropout. :param layer_attr: The Extra Attributes for layer, such as dropout.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
...@@ -1491,10 +1487,9 @@ def lstmemory(input, ...@@ -1491,10 +1487,9 @@ def lstmemory(input,
:type gate_act: BaseActivation :type gate_act: BaseActivation
:param state_act: state activation type, TanhActivation by default. :param state_act: state activation type, TanhActivation by default.
:type state_act: BaseActivation :type state_act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: Parameter Attribute. :param param_attr: Parameter Attribute.
:type param_attr: ParameterAttribute | None | False :type param_attr: ParameterAttribute | None | False
...@@ -1617,10 +1612,9 @@ def grumemory(input, ...@@ -1617,10 +1612,9 @@ def grumemory(input,
This activation affects the :math:`z_t` and :math:`r_t`. It is the This activation affects the :math:`z_t` and :math:`r_t`. It is the
:math:`\\sigma` in the above formula. :math:`\\sigma` in the above formula.
:type gate_act: BaseActivation :type gate_act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: Parameter Attribute. :param param_attr: Parameter Attribute.
:type param_attr: ParameterAttribute | None | False :type param_attr: ParameterAttribute | None | False
...@@ -1817,10 +1811,9 @@ def expand_layer(input, ...@@ -1817,10 +1811,9 @@ def expand_layer(input,
:type expand_as: LayerOutput :type expand_as: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param expand_level: whether input layer is timestep(default) or sequence. :param expand_level: whether input layer is timestep(default) or sequence.
:type expand_level: ExpandLevel :type expand_level: ExpandLevel
...@@ -1939,10 +1932,9 @@ def seq_reshape_layer(input, ...@@ -1939,10 +1932,9 @@ def seq_reshape_layer(input,
:type act: BaseActivation :type act: BaseActivation
:param layer_attr: extra layer attributes. :param layer_attr: extra layer attributes.
:type layer_attr: ExtraLayerAttribute. :type layer_attr: ExtraLayerAttribute.
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -2326,10 +2318,9 @@ def hsigmoid(input, ...@@ -2326,10 +2318,9 @@ def hsigmoid(input,
:type num_classes: int | None :type num_classes: int | None
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: Parameter Attribute. None means default parameter. :param param_attr: Parameter Attribute. None means default parameter.
:type param_attr: ParameterAttribute | None :type param_attr: ParameterAttribute | None
...@@ -2469,10 +2460,9 @@ def img_conv_layer(input, ...@@ -2469,10 +2460,9 @@ def img_conv_layer(input,
:type dilation: int | tuple | list :type dilation: int | tuple | list
:param dilation_y: The y dimension of the dilation. :param dilation_y: The y dimension of the dilation.
:type dilation_y: int :type dilation_y: int
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param num_channels: number of input channels. If None will be set :param num_channels: number of input channels. If None will be set
automatically from previous output. automatically from previous output.
...@@ -3219,10 +3209,9 @@ def addto_layer(input, act=None, name=None, bias_attr=None, layer_attr=None): ...@@ -3219,10 +3209,9 @@ def addto_layer(input, act=None, name=None, bias_attr=None, layer_attr=None):
:type input: LayerOutput | list | tuple :type input: LayerOutput | list | tuple
:param act: Activation Type. LinearActivation is the default. :param act: Activation Type. LinearActivation is the default.
:type act: BaseActivation :type act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer attribute. :param layer_attr: Extra Layer attribute.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
...@@ -3375,10 +3364,9 @@ def seq_concat_layer(a, b, act=None, name=None, layer_attr=None, ...@@ -3375,10 +3364,9 @@ def seq_concat_layer(a, b, act=None, name=None, layer_attr=None,
:type act: BaseActivation :type act: BaseActivation
:param layer_attr: Extra Layer Attribute. :param layer_attr: Extra Layer Attribute.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -3558,10 +3546,9 @@ def lstm_step_layer(input, ...@@ -3558,10 +3546,9 @@ def lstm_step_layer(input,
:type gate_act: BaseActivation :type gate_act: BaseActivation
:param state_act: State Activation Type. TanhActivation is the default. :param state_act: State Activation Type. TanhActivation is the default.
:type state_act: BaseActivation :type state_act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: layer's extra attribute. :param layer_attr: layer's extra attribute.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
...@@ -3617,10 +3604,9 @@ def gru_step_layer(input, ...@@ -3617,10 +3604,9 @@ def gru_step_layer(input,
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:param gate_act: Activation type of this layer's two gates. Default is Sigmoid. :param gate_act: Activation type of this layer's two gates. Default is Sigmoid.
:type gate_act: BaseActivation :type gate_act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: the parameter_attribute for transforming the output_mem :param param_attr: the parameter_attribute for transforming the output_mem
from previous step. from previous step.
...@@ -3680,10 +3666,9 @@ def gru_step_naive_layer(input, ...@@ -3680,10 +3666,9 @@ def gru_step_naive_layer(input,
:type act: BaseActivation :type act: BaseActivation
:param gate_act: Activation type of this layer's two gates. Default is Sigmoid. :param gate_act: Activation type of this layer's two gates. Default is Sigmoid.
:type gate_act: BaseActivation :type gate_act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: :param param_attr:
:param layer_attr: :param layer_attr:
...@@ -3813,10 +3798,9 @@ def recurrent_layer(input, ...@@ -3813,10 +3798,9 @@ def recurrent_layer(input,
:type input: LayerOutput :type input: LayerOutput
:param act: Activation type. TanhActivation is the default. :param act: Activation type. TanhActivation is the default.
:type act: BaseActivation :type act: BaseActivation
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param param_attr: parameter attribute. :param param_attr: parameter attribute.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
...@@ -4806,10 +4790,9 @@ def tensor_layer(a, ...@@ -4806,10 +4790,9 @@ def tensor_layer(a,
:type act: BaseActivation :type act: BaseActivation
:param param_attr: The Parameter Attribute. :param param_attr: The Parameter Attribute.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer config. :param layer_attr: Extra Layer config.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
...@@ -4871,10 +4854,9 @@ def selective_fc_layer(input, ...@@ -4871,10 +4854,9 @@ def selective_fc_layer(input,
:type act: BaseActivation :type act: BaseActivation
:param param_attr: The Parameter Attribute. :param param_attr: The Parameter Attribute.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer config. :param layer_attr: Extra Layer config.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
...@@ -5497,7 +5479,11 @@ def crf_decoding_layer(input, ...@@ -5497,7 +5479,11 @@ def crf_decoding_layer(input,
return LayerOutput(name, LayerType.CRF_DECODING_LAYER, parents, size=1) return LayerOutput(name, LayerType.CRF_DECODING_LAYER, parents, size=1)
@wrap_act_default(act=SigmoidActivation()) """
Following are cost Layers.
"""
@wrap_bias_attr_default(has_bias=True) @wrap_bias_attr_default(has_bias=True)
@wrap_param_attr_default() @wrap_param_attr_default()
@wrap_name_default() @wrap_name_default()
...@@ -5505,7 +5491,6 @@ def crf_decoding_layer(input, ...@@ -5505,7 +5491,6 @@ def crf_decoding_layer(input,
def nce_layer(input, def nce_layer(input,
label, label,
num_classes=None, num_classes=None,
act=None,
param_attr=None, param_attr=None,
weight=None, weight=None,
num_neg_samples=10, num_neg_samples=10,
...@@ -5514,9 +5499,12 @@ def nce_layer(input, ...@@ -5514,9 +5499,12 @@ def nce_layer(input,
bias_attr=None, bias_attr=None,
layer_attr=None): layer_attr=None):
""" """
Noise-contrastive estimation. Noise-contrastive estimation. This layer implements the method in the
Implements the method in the following paper: following paper:
A fast and simple algorithm for training neural probabilistic language models.
Reference:
A fast and simple algorithm for training neural probabilistic language
models. https://www.cs.toronto.edu/~amnih/papers/ncelm.pdf
The example usage is: The example usage is:
...@@ -5528,32 +5516,37 @@ def nce_layer(input, ...@@ -5528,32 +5516,37 @@ def nce_layer(input,
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:param input: The input layers. It could be a LayerOutput of list/tuple of LayerOutput. :param input: The input layers. It should be a LayerOutput or a list/tuple
of LayerOutput.
:type input: LayerOutput | list | tuple | collections.Sequence :type input: LayerOutput | list | tuple | collections.Sequence
:param label: label layer :param label: The ground truth.
:type label: LayerOutput :type label: LayerOutput
:param weight: weight layer, can be None(default) :param weight: The weight layer defines a weight for each sample in the
mini-batch. The default value is None.
:type weight: LayerOutput :type weight: LayerOutput
:param num_classes: number of classes. :param num_classes: The class number.
:type num_classes: int :type num_classes: int
:param act: Activation type. SigmoidActivation is the default. :param param_attr: The parameter attributes.
:type act: BaseActivation :type param_attr: ParameterAttribute|list
:param param_attr: The Parameter Attribute|list. :param num_neg_samples: The number of sampled negative labels. The default
:type param_attr: ParameterAttribute value is 10.
:param num_neg_samples: number of negative samples. Default is 10.
:type num_neg_samples: int :type num_neg_samples: int
:param neg_distribution: The distribution for generating the random negative labels. :param neg_distribution: The discrete noisy distribution over the output
A uniform distribution will be used if not provided. space from which num_neg_samples negative labels
If not None, its length must be equal to num_classes. are sampled. If this parameter is not set, a
uniform distribution will be used. A user defined
distribution is a list whose length must be equal
to the num_classes. Each member of the list defines
the probability of a class given input x.
:type neg_distribution: list | tuple | collections.Sequence | None :type neg_distribution: list | tuple | collections.Sequence | None
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The attribute for bias. If this parameter is set False or
False or something not type of ParameterAttribute, any object whose type is not ParameterAttribute, no bias
no bias is defined. If the parameter is set to is added. If this parameter is set True, the bias is
True, the bias is initialized to zero. initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param layer_attr: Extra Layer Attribute. :param layer_attr: Extra Layer Attribute.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: layer name. :return: The LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
if isinstance(input, LayerOutput): if isinstance(input, LayerOutput):
...@@ -5576,8 +5569,6 @@ def nce_layer(input, ...@@ -5576,8 +5569,6 @@ def nce_layer(input,
assert isinstance(neg_distribution, collections.Sequence) assert isinstance(neg_distribution, collections.Sequence)
assert len(neg_distribution) == num_classes assert len(neg_distribution) == num_classes
assert abs(sum(neg_distribution) - 1.0) < 1e-5 assert abs(sum(neg_distribution) - 1.0) < 1e-5
if not isinstance(act, BaseActivation):
raise TypeError()
ipts_for_layer = [] ipts_for_layer = []
parents = [] parents = []
...@@ -5599,7 +5590,7 @@ def nce_layer(input, ...@@ -5599,7 +5590,7 @@ def nce_layer(input,
type=LayerType.NCE_LAYER, type=LayerType.NCE_LAYER,
num_classes=num_classes, num_classes=num_classes,
neg_sampling_dist=neg_distribution, neg_sampling_dist=neg_distribution,
active_type=act.name, active_type=SigmoidActivation().name,
num_neg_samples=num_neg_samples, num_neg_samples=num_neg_samples,
inputs=ipts_for_layer, inputs=ipts_for_layer,
bias=ParamAttr.to_bias(bias_attr), bias=ParamAttr.to_bias(bias_attr),
...@@ -5609,12 +5600,7 @@ def nce_layer(input, ...@@ -5609,12 +5600,7 @@ def nce_layer(input,
LayerType.NCE_LAYER, LayerType.NCE_LAYER,
parents=parents, parents=parents,
size=l.config.size, size=l.config.size,
activation=act) activation=SigmoidActivation())
"""
following are cost Layers.
"""
@wrap_name_default() @wrap_name_default()
...@@ -5773,20 +5759,21 @@ def cross_entropy(input, ...@@ -5773,20 +5759,21 @@ def cross_entropy(input,
:param input: The first input layer. :param input: The first input layer.
:type input: LayerOutput. :type input: LayerOutput.
:param label: The input label. :param label: The input label.
:type input: LayerOutput. :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring. :type name: basestring
:param coeff: The cost is multiplied with coeff. :param coeff: The weight of the gradient in the back propagation.
The coefficient affects the gradient in the backward. 1.0 is the default.
:type coeff: float. :type coeff: float
:param weight: The cost of each sample is multiplied with each weight. :param weight: The cost of each sample is multiplied with each weight.
The weight should be a layer with size=1. Note that gradient The weight should be a layer with size=1. Note that gradient
will not be calculated for weight. will not be calculated for weight.
:type weight: LayerOutout :type weight: LayerOutout
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput. :rtype: LayerOutput
""" """
ipts, parents = __cost_input__(input, label, weight) ipts, parents = __cost_input__(input, label, weight)
...@@ -5819,19 +5806,21 @@ def cross_entropy_with_selfnorm(input, ...@@ -5819,19 +5806,21 @@ def cross_entropy_with_selfnorm(input,
label=label_layer) label=label_layer)
:param input: The first input layer. :param input: The first input layer.
:type input: LayerOutput. :type input: LayerOutput
:param label: The input label. :param label: The input label.
:type input: LayerOutput. :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring. :type name: basestring
:param coeff: The coefficient affects the gradient in the backward. :param coeff: The weight of the gradient in the back propagation.
:type coeff: float. 1.0 is the default.
:type coeff: float
:param softmax_selfnorm_alpha: The scale factor affects the cost. :param softmax_selfnorm_alpha: The scale factor affects the cost.
:type softmax_selfnorm_alpha: float. :type softmax_selfnorm_alpha: float
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput. :rtype: LayerOutput
""" """
Layer( Layer(
name=name, name=name,
...@@ -5852,7 +5841,7 @@ def cross_entropy_with_selfnorm(input, ...@@ -5852,7 +5841,7 @@ def cross_entropy_with_selfnorm(input,
@layer_support() @layer_support()
def sum_cost(input, name=None, layer_attr=None): def sum_cost(input, name=None, layer_attr=None):
""" """
A loss layer which calculate the sum of the input as loss A loss layer which calculates the sum of the input as loss.
The example usage is: The example usage is:
...@@ -5861,10 +5850,11 @@ def sum_cost(input, name=None, layer_attr=None): ...@@ -5861,10 +5850,11 @@ def sum_cost(input, name=None, layer_attr=None):
cost = sum_cost(input=input_layer) cost = sum_cost(input=input_layer)
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput. :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring. :type name: basestring
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput. :rtype: LayerOutput.
...@@ -5904,16 +5894,18 @@ def huber_regression_cost(input, ...@@ -5904,16 +5894,18 @@ def huber_regression_cost(input,
cost = huber_regression_cost(input=input_layer, label=label_layer) cost = huber_regression_cost(input=input_layer, label=label_layer)
:param input: The first input layer. :param input: The first input layer.
:type input: LayerOutput. :type input: LayerOutput
:param label: The input label. :param label: The input label.
:type input: LayerOutput. :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring. :type name: basestring
:param delta: The difference between the observed and predicted values. :param delta: The difference between the observed and predicted values.
:type delta: float. :type delta: float
:param coeff: The coefficient affects the gradient in the backward. :param coeff: The weight of the gradient in the back propagation.
:type coeff: float. 1.0 is the default.
:param layer_attr: Extra Layer Attribute. :type coeff: float
:param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput. :rtype: LayerOutput.
...@@ -5954,17 +5946,19 @@ def huber_classification_cost(input, ...@@ -5954,17 +5946,19 @@ def huber_classification_cost(input,
cost = huber_classification_cost(input=input_layer, label=label_layer) cost = huber_classification_cost(input=input_layer, label=label_layer)
:param input: The first input layer. :param input: The first input layer.
:type input: LayerOutput. :type input: LayerOutput
:param label: The input label. :param label: The input label.
:type input: LayerOutput. :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring. :type name: basestring
:param coeff: The coefficient affects the gradient in the backward. :param coeff: The weight of the gradient in the back propagation.
:type coeff: float. 1.0 is the default.
:param layer_attr: Extra Layer Attribute. :type coeff: float
:param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput. :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput) assert isinstance(input, LayerOutput)
if input.size is not None: if input.size is not None:
...@@ -6001,10 +5995,12 @@ def multi_binary_label_cross_entropy(input, ...@@ -6001,10 +5995,12 @@ def multi_binary_label_cross_entropy(input,
:param label: The input label. :param label: The input label.
:type input: LayerOutput :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring :type name: basestring
:param coeff: The coefficient affects the gradient in the backward. :param coeff: The weight of the gradient in the back propagation.
1.0 is the default.
:type coeff: float :type coeff: float
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6107,7 +6103,7 @@ def cross_entropy_over_beam(input, name=None): ...@@ -6107,7 +6103,7 @@ def cross_entropy_over_beam(input, name=None):
:param input: Input beams for this layer. :param input: Input beams for this layer.
:type input: BeamInput :type input: BeamInput
:param name: The name of this layer. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6142,7 +6138,7 @@ def cross_entropy_over_beam(input, name=None): ...@@ -6142,7 +6138,7 @@ def cross_entropy_over_beam(input, name=None):
def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None): def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None):
""" """
This is a L1 loss but more smooth. It requires that the This is a L1 loss but more smooth. It requires that the
size of input and label are equal. The formula is as follows, sizes of input and label are equal. The formula is as follows,
.. math:: .. math::
...@@ -6154,8 +6150,9 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None): ...@@ -6154,8 +6150,9 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None):
smooth_{L1}(x) = \\begin{cases} 0.5x^2& \\text{if} \\ |x| < 1 \\\\ |x|-0.5& \\text{otherwise} \end{cases} smooth_{L1}(x) = \\begin{cases} 0.5x^2& \\text{if} \\ |x| < 1 \\\\ |x|-0.5& \\text{otherwise} \end{cases}
More details can be found by referring to `Fast R-CNN Reference:
<https://arxiv.org/pdf/1504.08083v2.pdf>`_ Fast R-CNN
https://arxiv.org/pdf/1504.08083v2.pdf
The example usage is: The example usage is:
...@@ -6169,10 +6166,12 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None): ...@@ -6169,10 +6166,12 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None):
:param label: The input label. :param label: The input label.
:type input: LayerOutput :type input: LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: None | basestring :type name: basestring
:param coeff: The coefficient affects the gradient in the backward. :param coeff: The weight of the gradient in the back propagation.
1.0 is the default.
:type coeff: float :type coeff: float
:param layer_attr: Extra Layer Attribute. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6194,12 +6193,12 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None): ...@@ -6194,12 +6193,12 @@ def smooth_l1_cost(input, label, name=None, coeff=1.0, layer_attr=None):
@wrap_name_default() @wrap_name_default()
def multiplex_layer(input, name=None, layer_attr=None): def multiplex_layer(input, name=None, layer_attr=None):
""" """
This layer multiplex multiple layers according to the index, This layer multiplex multiple layers according to the indexes,
which is provided by the first input layer. which are provided by the first input layer.
inputs[0]: the index of the layer to output of size batchSize. inputs[0]: the indexes of the layers to form the output of size batchSize.
inputs[1:N]; the candidate output data. inputs[1:N]; the candidate output data.
For each index i from 0 to batchSize -1, the output is the i-th row of the For each index i from 0 to batchSize - 1, the i-th row of the output is the
(index[i] + 1)-th layer. the same to the i-th row of the (index[i] + 1)-th layer.
For each i-th row of output: For each i-th row of output:
.. math:: .. math::
...@@ -6218,7 +6217,8 @@ def multiplex_layer(input, name=None, layer_attr=None): ...@@ -6218,7 +6217,8 @@ def multiplex_layer(input, name=None, layer_attr=None):
:type input: list of LayerOutput :type input: list of LayerOutput
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:param layer_attr: extra layer attributes. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute. :type layer_attr: ExtraLayerAttribute.
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6322,14 +6322,14 @@ def row_conv_layer(input, ...@@ -6322,14 +6322,14 @@ def row_conv_layer(input,
:type context_len: int :type context_len: int
:param act: Activation Type. LinearActivation is the default. :param act: Activation Type. LinearActivation is the default.
:type act: BaseActivation :type act: BaseActivation
:param param_attr: The Parameter Attribute. If None, the parameter will be :param param_attr: The parameter attribute. See ParameterAttribute for
initialized smartly. It's better to set it by yourself. details.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param layer_attr: Extra Layer config. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput) assert isinstance(input, LayerOutput)
assert context_len > 0, "the context_len must be greatet than 0." assert context_len > 0, "the context_len must be greatet than 0."
...@@ -6354,7 +6354,7 @@ def prelu_layer(input, ...@@ -6354,7 +6354,7 @@ def prelu_layer(input,
param_attr=None, param_attr=None,
layer_attr=None): layer_attr=None):
""" """
The Parameter Relu activation that actives outputs with a learnable weight. The Parametric Relu activation that actives outputs with a learnable weight.
Reference: Reference:
Delving Deep into Rectifiers: Surpassing Human-Level Performance on Delving Deep into Rectifiers: Surpassing Human-Level Performance on
...@@ -6374,16 +6374,17 @@ def prelu_layer(input, ...@@ -6374,16 +6374,17 @@ def prelu_layer(input,
:type name: basestring :type name: basestring
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param partial_sum: this parameter makes a group of inputs share a same weight. :param partial_sum: this parameter makes a group of inputs share the same weight.
- partial_sum = 1, indicates the element-wise activation: each element has a weight. - partial_sum = 1, indicates the element-wise activation: each element has a weight.
- partial_sum = number of elements in one channel, indicates the channel-wise activation, elements in a channel share a same weight. - partial_sum = number of elements in one channel, indicates the channel-wise activation, elements in a channel share the same weight.
- partial_sum = number of outputs, indicates all elements share a same weight. - partial_sum = number of outputs, indicates all elements share the same weight.
:type partial_sum: int :type partial_sum: int
:param param_attr: The parameter attribute. See ParameterAttribute for details. :param param_attr: The parameter attribute. See ParameterAttribute for details.
:type param_attr: ParameterAttribute | None :type param_attr: ParameterAttribute
:param layer_attr: Extra layer configurations. Default is None. :param layer_attr: The extra layer attribute. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6439,34 +6440,34 @@ def gated_unit_layer(input, ...@@ -6439,34 +6440,34 @@ def gated_unit_layer(input,
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param size: output size of the gated unit. :param size: The dimension of this layer's output.
:type size: int :type size: int
:param act: Activation type of the projected input. LinearActivation is the default. :param act: Activation type of the projection. LinearActivation is the default.
:type act: BaseActivation :type act: BaseActivation
:param name: The name of this layer. It is optional. :param name: The name of this layer. It is optional.
:type name: basestring :type name: basestring
:param gate_attr: Attributes to tune the gate output, for example, error :param gate_attr: The extra layer attribute of the gate. See ExtraLayerAttribute for
clipping threshold, dropout and so on. See ExtraLayerAttribute for details.
more details.
:type gate_attr: ExtraLayerAttribute | None :type gate_attr: ExtraLayerAttribute | None
:param gate_param_attr: Attributes to tune the learnable projected matrix :param gate_param_attr: The parameter attribute of the gate. See ParameterAttribute
parameter of the gate. for details.
:type gate_param_attr: ParameterAttribute | None :type gate_param_attr: ParameterAttribute
:param gate_bias_attr: Attributes to tune the learnable bias of the gate. :param gate_bias_attr: The bias attribute of the gate. If the parameter is set to False or
:type gate_bias_attr: ParameterAttribute | None an object whose type is not ParameterAttribute, no bias is defined.
:param inproj_attr: Attributes to the tune the projected input, for If the parameter is set to True, the bias is initialized to zero.
example, error clipping threshold, dropout and so on. See :type gate_bias_attr: ParameterAttribute | bool | None | Any
ExtraLayerAttribute for more details. :param inproj_attr: Extra layer attributes of the projection. See ExtraLayerAttribute for
details.
:type inproj_attr: ExtraLayerAttribute | None :type inproj_attr: ExtraLayerAttribute | None
:param inproj_param_attr: Attributes to tune the learnable parameter of :param inproj_param_attr: The parameter attribute of the projection. See ParameterAttribute
the projection of input. for details.
:type inproj_param_attr: ParameterAttribute | None :type inproj_param_attr: ParameterAttribute
:param inproj_bias_attr: Attributes to tune the learnable bias of :param inproj_bias_attr: The bias attribute of the projection. If the parameter is set to False
projection of the input. or an object whose type is not ParameterAttribute, no bias is defined.
:type inproj_bias_attr: ParameterAttribute | None If the parameter is set to True, the bias is initialized to zero.
:param layer_attr: Attributes to tune the final output of the gated unit, :type inproj_bias_attr: ParameterAttribute | bool | None | Any
for example, error clipping threshold, dropout and so on. See :param layer_attr: Extra layer attribute of the product. See ExtraLayerAttribute for
ExtraLayerAttribute for more details. details.
:type layer_attr: ExtraLayerAttribute | None :type layer_attr: ExtraLayerAttribute | None
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -6662,9 +6663,9 @@ def clip_layer(input, min, max, name=None): ...@@ -6662,9 +6663,9 @@ def clip_layer(input, min, max, name=None):
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput. :type input: LayerOutput.
:param min: The lower threshold for clipping. :param min: The lower threshold for clipping.
:type min: double :type min: float
:param max: The upper threshold for clipping. :param max: The upper threshold for clipping.
:type max: double :type max: float
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
...@@ -6712,7 +6713,6 @@ def seq_slice_layer(input, starts, ends, name=None): ...@@ -6712,7 +6713,6 @@ def seq_slice_layer(input, starts, ends, name=None):
:type ends: LayerOutput | None :type ends: LayerOutput | None
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
""" """
assert isinstance(input, LayerOutput), ( assert isinstance(input, LayerOutput), (
...@@ -6833,20 +6833,21 @@ def img_conv3d_layer(input, ...@@ -6833,20 +6833,21 @@ def img_conv3d_layer(input,
:param padding: The numbers of padding along three axises. If the parameter is set to :param padding: The numbers of padding along three axises. If the parameter is set to
one integer, they will be same. one integer, they will be same.
:type padding: int | tuple | list :type padding: int | tuple | list
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:param num_channels: The number of input channels. If the parameter is not set or :param num_channels: The number of input channels. If the parameter is not set or
set to None, its actual value will be automatically set to set to None, its actual value will be automatically set to
the channels number of the input . the channels number of the input .
:type num_channels: int :type num_channels: int
:param param_attr: The parameter attribute of the convolution. :param param_attr: The parameter attribute of the convolution. See ParameterAttribute for
details.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param shared_biases: Whether biases will be shared between filters or not. :param shared_biases: Whether biases will be shared between filters or not.
:type shared_biases: bool :type shared_biases: bool
:param layer_attr: Extra layer attributes. :param layer_attr: The extra layer attributes. See ExtraLayerAttribute for
details.
:type layer_attr: ExtraLayerAttribute :type layer_attr: ExtraLayerAttribute
:param trans: True if it is a convTransLayer, False if it is a convLayer :param trans: True if it is a convTransLayer, False if it is a convLayer
:type trans: bool :type trans: bool
...@@ -6953,12 +6954,12 @@ def scale_shift_layer(input, name=None, param_attr=None, bias_attr=None): ...@@ -6953,12 +6954,12 @@ def scale_shift_layer(input, name=None, param_attr=None, bias_attr=None):
:type name: basestring :type name: basestring
:param input: The input of this layer. :param input: The input of this layer.
:type input: LayerOutput :type input: LayerOutput
:param param_attr: The parameter attribute of scaling. :param param_attr: The parameter attribute of scaling. See ParameterAttribute for
details.
:type param_attr: ParameterAttribute :type param_attr: ParameterAttribute
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
...@@ -7016,10 +7017,9 @@ def sub_seq_layer(input, offsets, sizes, act=None, bias_attr=None, name=None): ...@@ -7016,10 +7017,9 @@ def sub_seq_layer(input, offsets, sizes, act=None, bias_attr=None, name=None):
:type sizes: LayerOutput :type sizes: LayerOutput
:param act: Activation type, LinearActivation is the default. :param act: Activation type, LinearActivation is the default.
:type act: BaseActivation. :type act: BaseActivation.
:param bias_attr: The Bias Attribute. If the parameter is set to :param bias_attr: The bias attribute. If the parameter is set to False or an object
False or something not type of ParameterAttribute, whose type is not ParameterAttribute, no bias is defined. If the
no bias is defined. If the parameter is set to parameter is set to True, the bias is initialized to zero.
True, the bias is initialized to zero.
:type bias_attr: ParameterAttribute | None | bool | Any :type bias_attr: ParameterAttribute | None | bool | Any
:return: LayerOutput object. :return: LayerOutput object.
:rtype: LayerOutput :rtype: LayerOutput
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册