diff --git a/develop/doc/operators.json b/develop/doc/operators.json index 5ab375097b27689f43c7de4a6452b615eb45182b..24f9b8b81e2de988f5d8d13e78a2efa424e55aa0 100644 --- a/develop/doc/operators.json +++ b/develop/doc/operators.json @@ -2236,29 +2236,6 @@ "comment" : "(int, default 5(FP32)) Output tensor data type", "generated" : 0 } ] -},{ - "type" : "logical_xor", - "comment" : "logical_xor Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$\n", - "inputs" : [ - { - "name" : "X", - "comment" : "(LoDTensor) Left hand operand of logical_xor operator", - "duplicable" : 0, - "intermediate" : 0 - }, { - "name" : "Y", - "comment" : "(LoDTensor) Right hand operand of logical_xor operator", - "duplicable" : 0, - "intermediate" : 0 - } ], - "outputs" : [ - { - "name" : "Out", - "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$", - "duplicable" : 0, - "intermediate" : 0 - } ], - "attrs" : [ ] },{ "type" : "pad", "comment" : "\nPad Operator.\n\nPad input into output, as specified by paddings and pad_value. \nThe input should be a k-D tensor(k > 0 and k < 7). As an example:\n\nGiven:\n\nX = [[1, 2],\n [3, 4]],\n\npaddings = [0, 1, 1, 2],\n\nand\n\npad_value = 0,\n\nwe have:\n\nOut = [[0, 1, 2, 0, 0]\n [0, 3, 4, 0, 0]\n [0, 0, 0, 0, 0]]\n\n", @@ -2844,6 +2821,40 @@ "comment" : "The exponential factor of Pow", "generated" : 0 } ] +},{ + "type" : "lookup_table", + "comment" : "\nLookup Table Operator.\n\nThis operator is used to perform lookups on the parameter W,\nthen concatenated into a dense tensor.\n\nThe input Ids can carry the LoD (Level of Details) information,\nor not. And the output only shares the LoD information with input Ids.\n\n", + "inputs" : [ + { + "name" : "W", + "comment" : "An input represents embedding tensors, which is a learnable parameter.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "Ids", + "comment" : "An input with type int32 or int64 contains the ids to be looked up in W. Ids must be a column vector with rank = 2. The 2nd dimension size must be 1.", + "duplicable" : 0, + "intermediate" : 0 + } ], + "outputs" : [ + { + "name" : "Out", + "comment" : "The lookup results, which have the same type as W.", + "duplicable" : 0, + "intermediate" : 0 + } ], + "attrs" : [ + { + "name" : "is_sparse", + "type" : "bool", + "comment" : "(boolean, default false) Sparse update", + "generated" : 0 + }, { + "name" : "padding_idx", + "type" : "long", + "comment" : "(int64, default -1) If the value is -1, it makes no effect to lookup. Otherwise the given value indicates padding the output with zeros whenever lookup encounters it in Ids.", + "generated" : 0 + } ] },{ "type" : "unpool", "comment" : "\nInput shape is: $(N, C_{in}, H_{in}, W_{in})$, Output shape is:\n$(N, C_{out}, H_{out}, W_{out})$, where\n$$\nH_{out} = (H_{in}−1) * strides[0] − 2 * paddings[0] + ksize[0] \\\\\nW_{out} = (W_{in}−1) * strides[1] − 2 * paddings[1] + ksize[1]\n$$\nPaper: http://www.matthewzeiler.com/wp-content/uploads/2017/07/iccv2011.pdf\n", @@ -3394,6 +3405,105 @@ "comment" : "(float, default 1.0)The scaling factor of the scale operator.", "generated" : 0 } ] +},{ + "type" : "lstmp", + "comment" : "\nLong-Short Term Memory with recurrent Projection layer (LSTMP) Operator.\n\nLSTMP has a separate projection layer after the LSTM layer, projecting the \noriginal hidden state to a lower-dimensional one, which is proposed to reduce \nthe number of total parameters and furthermore computational complexity for \nthe LSTM, espeacially for the case that the size of output units is relative \nlarge (https://research.google.com/pubs/archive/43905.pdf). \n\nThe formula is as follows:\n\n$$\ni_t = \\sigma(W_{ix}x_{t} + W_{ir}r_{t-1} + W_{ic}c_{t-1} + b_i) \\\\\n\nf_t = \\sigma(W_{fx}x_{t} + W_{fr}r_{t-1} + W_{fc}c_{t-1} + b_f) \\\\\n\n\\tilde{c_t} = act_g(W_{cx}x_t + W_{cr}r_{t-1} + b_c) \\\\\n\no_t = \\sigma(W_{ox}x_{t} + W_{or}r_{t-1} + W_{oc}c_t + b_o) \\\\\n\nc_t = f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t} \\\\\n\nh_t = o_t \\odot act_h(c_t) \\\\\n\nr_t = \\overline{act_h}(W_{rh}h_t)\n$$\n\nwhere the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix\nof weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$\nare diagonal weight matrices for peephole connections. In our implementation,\nwe use vectors to reprenset these diagonal weight matrices. The b terms\ndenote bias vectors ($b_i$ is the input gate bias vector), $\\sigma$\nis the activation, such as logistic sigmoid function, and\n$i, f, o$ and $c$ are the input gate, forget gate, output gate,\nand cell activation vectors, respectively, all of which have the same size as\nthe cell output activation vector $h$. Here $h$ is usually called the hidden \nstate and $r$ denotes its recurrent projection. And $\\tilde{c_t}$ is also \ncalled the candidate hidden state, whose computation is based on the current \ninput and previous hidden state.\n\nThe $\\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$\nare the cell input and cell output activation functions and `tanh` is usually\nused for them. $\\overline{act_h}$ is the activation function for the \nprojection output, usually using `identity` or same as $act_h$.\n\nNote that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$\noperations on the input $x_{t}$ are NOT included in this operator.\nUsers can choose to use fully-connected operator before LSTMP operator.\n\n", + "inputs" : [ + { + "name" : "Input", + "comment" : "(LoDTensor) the input for sequence data, which supports variable-time length input sequence. The underlying tensor in this LoDTensor is a matrix with shape (T X 4D), where T is the total time steps in this mini-batch, D is the hidden size.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "H0", + "comment" : "(Tensor, optional) the initial hidden state is an optional input. This is a tensor with shape (N x D), where N is the batch size and D is the hidden size.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "C0", + "comment" : "(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `C0` should not be null if `H0` provided.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "Weight", + "comment" : "(Tensor) the learnable hidden-hidden weights. - The shape is (P x 4D), where P is the projection layer size and D is the hidden size. - Weight = {W_cr, W_ir, W_fr, W_or}", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "ProjWeight", + "comment" : "(Tensor) the learnable weight of the projection layer. - The shape is (D x P), where P is the recurrent projection layer size and D is the hidden size. - ProjWeight = {W_rh}", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "Bias", + "comment" : "(Tensor) the learnable biases, which contains two parts: input-hidden biases and peephole connections weights if setting `use_peepholes` to `True`. 1. `use_peepholes = False` - The shape is (1 x 4D). - Bias = {b_c, b_i, b_f, b_o}.2. `use_peepholes = True` - The shape is (1 x 7D). - Bias = {b_c, b_i, b_f, b_o, W_ic, W_fc, W_oc}.", + "duplicable" : 0, + "intermediate" : 0 + } ], + "outputs" : [ + { + "name" : "Projection", + "comment" : "(LoDTensor) the projection of the hidden state of LSTMP operator. The shape is (T x P), and LoD is the same with the `Input`.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "Cell", + "comment" : "(LoDTensor) the cell state of LSTMP operator. The shape is (T x D), and lod is the same with the `Input`.", + "duplicable" : 0, + "intermediate" : 0 + }, { + "name" : "BatchGate", + "comment" : "(LoDTensor) This LoDTensor contains input gate, forget gate and output gate after the activations. This LoDTensor has the same shape as the reorganized input, which is also be called batch input. The LoD size is 2. The first-level LoD is the batch offsets and the second contains the indices, which denotes the position of reorganized sequence in the raw input.", + "duplicable" : 0, + "intermediate" : 1 + }, { + "name" : "BatchCellPreAct", + "comment" : "(LoDTensor) the pre-activation cell state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.", + "duplicable" : 0, + "intermediate" : 1 + }, { + "name" : "BatchHidden", + "comment" : "(LoDTensor) the hidden state reorganized in batch. This LoDTensor is obtained in the forward and used in the backward.", + "duplicable" : 0, + "intermediate" : 1 + }, { + "name" : "OrderedP0", + "comment" : "(Tensor) the projection of the initial hidden state H0. This is a tensor with shape (N x P), where N is the batch size and P is the hidden size.", + "duplicable" : 0, + "intermediate" : 1 + } ], + "attrs" : [ + { + "name" : "use_peepholes", + "type" : "bool", + "comment" : "(bool, defalut: True) whether to enable diagonal/peephole connections.", + "generated" : 0 + }, { + "name" : "is_reverse", + "type" : "bool", + "comment" : "(bool, defalut: False) whether to compute reversed LSTMP.", + "generated" : 0 + }, { + "name" : "gate_activation", + "type" : "string", + "comment" : "(string, default: sigmoid)The activation for input gate, forget gate and output gate, `sigmoid` by default.", + "generated" : 0 + }, { + "name" : "cell_activation", + "type" : "string", + "comment" : "(string, default: tanh)The activation for cell output, `tanh` by defalut.", + "generated" : 0 + }, { + "name" : "candidate_activation", + "type" : "string", + "comment" : "(string, default: tanh)The activation for candidate hidden state, `tanh` by default.", + "generated" : 0 + }, { + "name" : "proj_activation", + "type" : "string", + "comment" : "(string, default: tanh)The activation for projection output, `tanh` by defalut.", + "generated" : 0 + } ] },{ "type" : "mean", "comment" : "\nMean Operator.\n\nOut is a scalar which is the mean of all elements in X. \n\n", @@ -3413,58 +3523,47 @@ } ], "attrs" : [ ] },{ - "type" : "lookup_table", - "comment" : "\nLookup Table Operator.\n\nThis operator is used to perform lookups on the parameter W,\nthen concatenated into a dense tensor.\n\nThe input Ids can carry the LoD (Level of Details) information,\nor not. And the output only shares the LoD information with input Ids.\n\n", + "type" : "lod_tensor_to_array", + "comment" : "", "inputs" : [ { - "name" : "W", - "comment" : "An input represents embedding tensors, which is a learnable parameter.", + "name" : "X", + "comment" : "", "duplicable" : 0, "intermediate" : 0 }, { - "name" : "Ids", - "comment" : "An input with type int32 or int64 contains the ids to be looked up in W. Ids must be a column vector with rank = 2. The 2nd dimension size must be 1.", + "name" : "RankTable", + "comment" : "", "duplicable" : 0, "intermediate" : 0 } ], "outputs" : [ { "name" : "Out", - "comment" : "The lookup results, which have the same type as W.", + "comment" : "", "duplicable" : 0, "intermediate" : 0 } ], - "attrs" : [ - { - "name" : "is_sparse", - "type" : "bool", - "comment" : "(boolean, default false) Sparse update", - "generated" : 0 - }, { - "name" : "padding_idx", - "type" : "long", - "comment" : "(int64, default -1) If the value is -1, it makes no effect to lookup. Otherwise the given value indicates padding the output with zeros whenever lookup encounters it in Ids.", - "generated" : 0 - } ] + "attrs" : [ ] },{ - "type" : "lod_tensor_to_array", - "comment" : "", + "type" : "logical_xor", + "comment" : "logical_xor Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$\n", "inputs" : [ { "name" : "X", - "comment" : "", + "comment" : "(LoDTensor) Left hand operand of logical_xor operator", "duplicable" : 0, "intermediate" : 0 }, { - "name" : "RankTable", - "comment" : "", + "name" : "Y", + "comment" : "(LoDTensor) Right hand operand of logical_xor operator", "duplicable" : 0, "intermediate" : 0 } ], "outputs" : [ { "name" : "Out", - "comment" : "", + "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$", "duplicable" : 0, "intermediate" : 0 } ], @@ -5034,6 +5133,24 @@ "intermediate" : 0 } ], "attrs" : [ ] +},{ + "type" : "floor", + "comment" : "\nFloor Activation Operator.\n\n$out = floor(x)$\n\n", + "inputs" : [ + { + "name" : "X", + "comment" : "Input of Floor operator", + "duplicable" : 0, + "intermediate" : 0 + } ], + "outputs" : [ + { + "name" : "Out", + "comment" : "Output of Floor operator", + "duplicable" : 0, + "intermediate" : 0 + } ], + "attrs" : [ ] },{ "type" : "sequence_concat", "comment" : "\nThe sequence_concat operator concatenates multiple LoDTensors.\nIt only supports sequence (LoD Tensor with level number is 1)\nor a nested sequence (LoD tensor with level number is 2) as its input.\n- Case1:\n If the axis is other than 0(here, axis is 1 and level is 1),\n each input should have the same LoD information and the LoD\n information of the output keeps the same as the input.\n\n LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n LoD(x1) = {{0,2,4}, {0,1,2,3,4}}; Dims(x1) = (4,4,4)\n LoD(Out) = {{0,2,4}, {0,1,2,3,4}}; Dims(Out) = (4,7,4)\n\n- Case2:\n If the axis is 0(here, leve is 0), the inputs are concatenated along\n time steps, the LoD information of the output need to re-compute.\n The LoD information of level-1 should be same.\n\n LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n LoD(x1) = {{0,2,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n LoD(Out) = {{0,2,4}, {0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\n- Case3:\n If the axis is 0(here, level is 1).\n\n LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n LoD(x1) = {{0,3,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n LoD(Out) = {{0,5,8}, {0,1,2,3,5,7,8,9,11}}; Dims(Out) = (11,3,4)\n\n- Case4:\n If the LoD number is 1, axis is 0, level is 0\n\n LoD(x0) = {{0,1,2,3,4}}; Dims(x0) = (4,3,4)\n LoD(x1) = {{0,1,3,5,7}}; Dims(x1) = (7,3,4)\n LoD(Out) = {{0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\nNOTE: The levels of all the inputs should be the same.\n ", @@ -5063,24 +5180,6 @@ "comment" : "(int, default 0) The level at which the inputs will be joined. If the level is 0, the inputs will be joined at the nested sequence level. If the level is 1, the inputs will be joined at the sequence level. The level should be less than the level number of inputs.", "generated" : 0 } ] -},{ - "type" : "floor", - "comment" : "\nFloor Activation Operator.\n\n$out = floor(x)$\n\n", - "inputs" : [ - { - "name" : "X", - "comment" : "Input of Floor operator", - "duplicable" : 0, - "intermediate" : 0 - } ], - "outputs" : [ - { - "name" : "Out", - "comment" : "Output of Floor operator", - "duplicable" : 0, - "intermediate" : 0 - } ], - "attrs" : [ ] },{ "type" : "cast", "comment" : "\nCast Operator.\n\nThis Operator casts the input tensor to another data type and\nreturns tha Output Tensor.\n\n",