未验证 提交 4d3eefbb 编写于 作者: X xiemoyuan 提交者: GitHub

Modify the docs for Transformer's APIs. test=document_fix (#27729)

上级 ab85a891
......@@ -644,7 +644,7 @@ class TransformerDecoderLayer(Layer):
`weight_attr` to create parameters. Default: None, which means the
default weight parameter property is used. See usage for details
in :ref:`api_fluid_ParamAttr` .
bias_attr (ParamAttr|tuple, optional): To specify the bias parameter property.
bias_attr (ParamAttr|tuple|bool, optional): To specify the bias parameter property.
If it is a tuple, `bias_attr[0]` would be used as `bias_attr` for
self attention, `bias_attr[1]` would be used as `bias_attr` for
cross attention, and `bias_attr[2]` would be used as `bias_attr`
......@@ -982,12 +982,12 @@ class Transformer(Layer):
applies another layer normalization on the output of last encoder/decoder layer.
Parameters:
d_model (int): The expected feature size in the encoder/decoder input
and output.
nhead (int): The number of heads in multi-head attention(MHA).
num_encoder_layers (int): The number of layers in encoder.
num_encoder_layers (int): The number of layers in decoder.
dim_feedforward (int): The hidden layer size in the feedforward network(FFN).
d_model (int, optional): The expected feature size in the encoder/decoder input
and output. Default 512
nhead (int, optional): The number of heads in multi-head attention(MHA). Default 8
num_encoder_layers (int, optional): The number of layers in encoder. Default 6
num_decoder_layers (int, optional): The number of layers in decoder. Default 6
dim_feedforward (int, optional): The hidden layer size in the feedforward network(FFN). Default 2048
dropout (float, optional): The dropout probability used in pre-process
and post-precess of MHA and FFN sub-layer. Default 0.1
activation (str, optional): The activation function in the feedforward
......@@ -1015,7 +1015,7 @@ class Transformer(Layer):
Default: None, which means the default weight parameter property is used.
See usage for details
in :code:`ParamAttr` .
bias_attr (ParamAttr|tuple, optional): To specify the bias parameter property.
bias_attr (ParamAttr|tuple|bool, optional): To specify the bias parameter property.
If it is a tuple, the length of `bias_attr` could be 1, 2 or 3. If it is 3,
`bias_attr[0]` would be used as `bias_attr` for self attention, `bias_attr[1]`
would be used as `bias_attr` for cross attention of `TransformerDecoder`,
......@@ -1028,9 +1028,9 @@ class Transformer(Layer):
The `False` value means the corresponding layer would not have trainable
bias parameter. See usage for details in :code:`ParamAttr` .
Default: None,which means the default bias parameter property is used.
custom_encoder (Layer): If custom encoder is provided, use it as the encoder.
custom_encoder (Layer, optional): If custom encoder is provided, use it as the encoder.
Default None
custom_decoder (Layer): If custom decoder is provided, use it as the decoder.
custom_decoder (Layer, optional): If custom decoder is provided, use it as the decoder.
Default None
Examples:
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册