.add_fields('uint32',Doc('input_order','The sequence data layout, allows the user to select 3! = 6 different data layouts or permutations of BEAM, BATCH and TIME dimensions.'),'0')
.add_fields('bool',Doc('reslink','Whether to add input query to final output.'),'false')
.add_fields('bool',Doc('training','Whether it is in training mode.'),'true')
.add_fields('bool',Doc('bias','Whether to add linear bias.'),'false')
.add_fields('bool',Doc('attn_mask','Whether to add attn_mask.'),'false')
See :class:`~.module.MultiHeadAttn` for more details.
Note: This API is experimental, and there is a possibility of subsequent changes. Currently, only the cuda platform is supported, and if the cudnn version >=8.6.0, the calculation results are completely correct; If the cudnn version >=8.0.4 but <8.6.0, if there is a bias, only the dbias result calculated from the backward is incorrect. If there is no bias, the forward and backward calculations are correct; If the cudnn version is less than 8.0.4, this operator is not supported.
Args:
query, key, value: map a query and a set of key-value pairs to an output.
See "Attention Is All You Need" for more details.
embed_dim: total dimension of the model.
num_heads: parallel attention heads.
attn_drop: probability of an element to be zeroed, used in attention matrix.
out_drop: probability of an element to be zeroed, used in final output.
io_weight_bias: input/output projection weight/bias all in one, used for cudnn api.
bias: used to indicate a bias in io_weight_bias, used for cudnn api.
reslink: add input query to final output.
training: will apply dropout if is ``True``.
attn_mask: used to indicate whether to add a mask to the attention matrix.
By default, the upper right triangle of the mask is -inf, and the diagonal and lower left triangle are all 0.
Note: This API is experimental, and there is a possibility of subsequent changes. Currently, only the cuda platform is supported, and if the cudnn version >=8.6.0, the calculation results are completely correct; If the cudnn version >=8.0.4 but <8.6.0, if there is a bias, only the dbias result calculated from the backward is incorrect. If there is no bias, the forward and backward calculations are correct; If the cudnn version is less than 8.0.4, this operator is not supported.
Args:
embed_dim: Total dimension of the model.
num_heads: Number of parallel attention heads. Note that ``embed_dim`` will be split
across ``num_heads`` (i.e. each head will have dimension ``embed_dim // num_heads``).
dropout: Dropout probability on ``attn_output_weights``. Default: ``0.0`` (no dropout).
bias: If specified, adds bias to input / output projection layers. Default: ``True``.
kdim: Total number of features for keys. Default: ``None`` (uses ``kdim=embed_dim``).
vdim: Total number of features for values. Default: ``None`` (uses ``vdim=embed_dim``).