operators.json

[
{
 "type" : "sgd",
 "comment" : "\n\nSGD operator\n\nThis operator implements one step of the stochastic gradient descent algorithm.\n\n$$param\\_out = param - learning\\_rate * grad$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor) Learning rate of SGD",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "adagrad",
 "comment" : "\n\nAdaptive Gradient Algorithm (Adagrad).\n\nThe update is done as follows:\n\n$$moment\\_out = moment + grad * grad \\\\\nparam\\_out = param - \\frac{learning\\_rate * grad}{\\sqrt{moment\\_out} + \\epsilon}\n$$\n\nThe original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)\ndoes not have the epsilon attribute. It is added here in our implementation\nas also proposed here: http://cs231n.github.io/neural-networks-3/#ada\nfor numerical stability to avoid the division by zero error.\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment",
   "comment" : "(Tensor) Second moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor) Learning rate",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MomentOut",
   "comment" : "(Tensor) Output second moment",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1.0e-6) Constant for numerical stability",
   "generated" : 0
 } ] 
},{
 "type" : "conv3d",
 "comment" : "\nConvolution3D Operator.\n\nThe convolution operation calculates the output based on the input, filter\nand strides, paddings, dilations, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCDHW format, where N is batch\nsize, C is the number of channels,D is the depth of the feature, H is the height of\nthe feature, and W is the width of the feature.\nFilters(Input) is MCDHW format, where M is the number of output image channels,\nC is the number of input image channels, D is the depth of the filter,\nH is the height of the filter, and W is the width of the filter.\nParameters(strides, paddings, dilations) are three elements. These three elements\nrepresent depth, height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, D_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{out}, C_{in}, D_f, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out}= \\frac{(D_{in} + 2 * paddings[0] - (dilations[0] * (D_f - 1) + 1))}{ strides[0]}+ 1 \\\\\n       H_{out}= \\frac{(H_{in} + 2 * paddings[1] - (dilations[1] * (H_f - 1) + 1))}{ strides[1]}+ 1 \\\\\n       W_{out}= \\frac{(W_{in} + 2 * paddings[2] - (dilations[2] * (W_f - 1) + 1))}{ strides[2]}+ 1\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution operator. The format of input tensor is NCDHW. Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution operator. The format of the filter tensor is MCDHW, where M is the number of output image channels, C is the number of input image channels, D is the depth of the filter, H is the height of the filter, and W is the width of the filter.If the groups attribute is greater than 1, C equals the number of input image channels divided by the groups.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution operator.The format of output tensor is also NCDHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default:{1, 1, 1}), the strides(d_stride, h_stride, w_stride) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default:{0, 0, 0}), the paddings(d_pad, h_pad, w_pad) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "groups",
   "type" : "int",
   "comment" : "(int default:1), the groups number of the convolution operator. According to grouped convolution in Alex Krizhevsky's Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1, 1}), the dilations(d_dilation, h_dilation, w_dilation) of convolution operator.",
   "generated" : 0
 } ] 
},{
 "type" : "conv2d",
 "comment" : "\nConvolution Operator.\n\nThe convolution operation calculates the output based on the input, filter\nand strides, paddings, dilations, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and Output(Output) are in NCHW format. Where N is batch\nsize, C is the number of channels, H is the height of the feature, and W is\nthe width of the feature.\nFilters(Input) is MCHW format. Where M is the number of output image channels, C is\nthe number of input image channels, H is the height of the filter, and W\nis the width of the filter.\nParameters(strides, paddings, dilations) are two elements. These two elements represent\nheight and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{out}, C_{in}, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, H_{out}, W_{out})$\n  Where\n$$\n       H_{out}= \\frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]}+ 1 \\\\\n       W_{out}= \\frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]}+ 1\n$$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution operator. The format of input tensor is NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution operator. The format of the filter tensor is MCHW, where M is the number of output image channels, C is the number of input image channels, H is the height of the filter, and W is the width of the filter. If the groups attribute is greater than 1, C equals the number of input image channels divided by the groups.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution operator. The format of output tensor is also NCHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the strides(h_stride, w_stride) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "groups",
   "type" : "int",
   "comment" : "(int default:1), the groups number of the convolution operator. According to grouped convolution in Alex Krizhevsky's Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the dilations(h_dilation, w_dilation) of convolution operator.",
   "generated" : 0
 } ] 
},{
 "type" : "pool3d",
 "comment" : "\nPool3d Operator.\n\nThe pooling3d operation calculates the output based on\nthe input, pooling_type, ksize, strides, and paddings parameters.\nInput(X) and output(Out) are in NCDHW format, where N is batch\nsize, C is the number of channels, and D, H and W are the depth, height and\nwidth of the feature, respectively. Parameters(ksize, strides, paddings) \nare three elements. These three elements represent depth, height and \nwidth, respectively. The input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       X shape: $(N, C, D_{in}, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out} = \\frac{(D_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       H_{out} = \\frac{(H_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[2] + 2 * paddings[2])}{strides[2]} + 1\n  $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCDHW, where N is batch size, C is the number of channels, and D, H and W is the depth, height and width of the feature, respectively.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator.The format of output tensor is also NCDHW, where N is batch size, C is the number of channels, and D, H and W is the depth, height and width of the feature, respectively.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "pooling_type",
   "type" : "string",
   "comment" : "(string) Pooling type, can be \"max\" for max-pooling and \"avg\" for average-pooling.",
   "generated" : 0
 }, { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(depth, height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default false) Whether to use the global pooling. If global_pooling = true, ksize and paddings wille be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1,1,1}) Strides(depth, height, width) of the pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default {0,0,0}), paddings(depth, height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "pool2d",
 "comment" : "\nPool2d Operator.\n\nThe pooling2d operation calculates the output based on\nthe input, pooling_type and ksize, strides, paddings parameters.\nInput(X) and output(Out) are in NCHW format, where N is batch size, C is the\nnumber of channels, H is the height of the feature, and W is the width of the feature.\nParameters(ksize, strides, paddings) are two elements.\nThese two elements represent height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:   \n  Input:\n       X shape: $(N, C, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, H_{out}, W_{out})$\n  Where\n       $$ \n       H_{out} = \\frac{(H_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1\n       $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator. The format of output tensor is also NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "pooling_type",
   "type" : "string",
   "comment" : "(string), pooling type, can be \"max\" for max-pooling and \"avg\" for average-pooling.",
   "generated" : 0
 }, { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(height, width) of the pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default false) Whether to use the global pooling. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1, 1}), strides(height, width) of pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default {0,0}), paddings(height, width) of pooling operator.If global_pooling = true, paddings and ksize will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "max_pool3d_with_index",
 "comment" : "\nMaxPool3d Operator.\n\nThe maxpooling3d with index operation calculates the output and the mask\nbased on the input and ksize, strides, paddings parameters.\nInput(X) and output(Out, Mask) are in NCDHW format, where N is batch\nsize, C is the number of channels, and D, H and W are the depth, height and\nwidth of the feature, respectively. \nParameters(ksize, strides, paddings) are three elements.\nThese three elements represent depth, height and width, respectively.\nThe input(X) size and output(Out, Mask) size may be different.\n\nExample:\n  Input:\n       X shape: $(N, C, D_{in}, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, D_{out}, H_{out}, W_{out})$\n       Mask shape: $(N, C, D_{out}, H_{out}, W_{out})$\n  Where\n       $$\n       D_{out} = \\frac{(D_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       H_{out} = \\frac{(H_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[2] + 2 * paddings[2])}{strides[2]} + 1\n       $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCDHW, where N is batch size, C is the number of channels, and D, H and W are the depth, height and width of the image, respectively",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator. The format of output tensor is also NCDHW, where N is the batch size, C is the number of channels, and D, H and W are the depth, height and width of the image, respectively.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mask",
   "comment" : "(Tensor) The Mask tensor of pooling operator. The format of output tensor is also NCDHW, where N is the batch size, C is the number of channels, and D, H and W are the depth, height and width of the image, respectively. It represents the index in the current feature map.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(depth, height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default false) Whether to use the global pooling. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1,1,1}), strides(depth, height, width) of pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector, default {0,0,0}), paddings(depth, height, width) of pooling operator. If global_pooling = true, paddings and ksize will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "lod_rank_table",
 "comment" : "Create LoDRanTable by LoDTensor\n\nLoD Rank Table stores the `level` of `lod` which is ordered by sequence\nlength in descending order. It is useful when implement dynamic RNN and is\nshared by dynamic RNN memory, dynamic RNN slice input and dynamic RNN slice\noutput operators.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) input lod tensor, must contain lod information.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDRankTable) The rank table of specific level.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "level",
   "type" : "int",
   "comment" : "(int) the specific lod level to rank.",
   "generated" : 0
 } ] 
},{
 "type" : "array_to_lod_tensor",
 "comment" : "This Op build a big LoDTensor from a std::vector<LoDTensor> \n          and a LoDRankTable. It is supposed to be used in getting dynamic RNN's\n          outputs back to a normal LoDTensor. The std::vector<LoDTensor> \n          would be the output of RNN Op and the LoDRankTable would be build \n          with RNN's input.",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(std::vector<LodTensor>) A vector of tensors that is going to be casted to a big LoDTensor.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "RankTable",
   "comment" : "(LoDRankTable) RankTable provides the coarse lod infomation to build the output LoDTensor. See 'paddle/framework/lod_rank_table.h' for more details.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) The LoDTensor formed by input tensor array.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "sequence_conv",
 "comment" : "\nSequence Conv Operator.\n\nSequenceConvOp performs convolution operation on features of contextLength\ntime-steps of each instance. The convolution operation calculates the output\nbased on the input, filter, strides and paddings parameters.\nThe size of each dimension of the parameters is checked during infer-shape.\nIn order to ensure the equal length of sequence before and after convolution,\nit is necessary to fill the top and bottom of each sequence based on\ncontext_length, context_stride and context_start.\n\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the input(X) is a LodTensor, which supports variable-time length input sequence. The underlying tensor in this LoDTensor is a matrix with shape (T, N), where T is the total time steps in this mini-batch and N is the input_hidden_size.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "PaddingData",
   "comment" : "(Tensor, optional) the input(PaddingData) is an optional parameter, and it is learnable. This is a tensor with shape (P, N), where P is the top_pad + bottom_pad, N is the input_hidden_size. In order to ensure the equal length of sequence before and after convolution, it is necessary to fill the top and bottom of each sequence according to context_length, context_stride and context_start",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) the input(Filter) is an learnable parameter.This is a tensor with shape (K, M), where K is the context_length * input_hidden_size, M is the output feature size.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) the output(Out) is a LodTensor, which support variable-time length output sequence. The underlying tensor in this LoDTensor is a matrix with shape (T, M), where, T is the total time steps in this mini-batch, M is the output feature size.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "paddingTrainable",
   "type" : "bool",
   "comment" : "(bool, default:false) the padding data of SequenceConvOp is trainable or not.",
   "generated" : 0
 }, { 
   "name" : "contextLength",
   "type" : "int",
   "comment" : "(int) the contextLength of SequenceConvOp is the height of the convolution kernel.",
   "generated" : 0
 }, { 
   "name" : "contextStart",
   "type" : "int",
   "comment" : "(int, default:0) the contextStart of SequenceConvOp represents the beginning of the convolution of the number of rows of sequence, which can be negative. The negative number means to pad contextStart time-steps of zeros or learnable parameters at the beginning of each instance. The positive number means to skip contextStart time-steps of each instance.",
   "generated" : 0
 }, { 
   "name" : "contextStride",
   "type" : "int",
   "comment" : "(int, default:1) the contextStride of SequenceConvOp represents the stride length of convolution kernel. Currently, SequenceConvOp only supportscontextStride=1.",
   "generated" : 0
 } ] 
},{
 "type" : "sequence_pool",
 "comment" : "\nSequence Pool Operator.\n\nThe SequencePoolOp pools features of all time-steps of each instance.\nIt supports six pooling types:\n1. AVERAGE: $$Out[i] = \\frac{\\sum_i X_i}{N}$$\n2. SUM:     $$Out[i] = \\sum_jX_{ij}$$\n3. SQRT:    $$Out[i] = \\frac{\\sum_jX_{ij}}{\\sqrt{len(X_i)}}$$\n4. LAST:    Out[i] = last instance in i-th sequence X[i]\n5. FIRST:   Out[i] = first instance in i-th sequence X[i]\n6. MAX:     $$Out[i] = max(X_i)$$\n\nThe following example explains how this works:\nFor a mini-batch of 3 variable-length sentences,\ncontaining 2, 3, and 2 time-steps:\n\nAssume X is a [7,M,N] LoDTensor, and X->lod()[0] = [0, 2, 5, 7], 7=2+3+2.\nBesides, for the sake of simplicity, we assume M=1 and N=1,\nand the value of X = [[1, 3], [2, 4, 6], [5, 1]].\n\nThus, Out is a [3,1,1] Tensor without LoD infomation.\nAnd for different pooltype, the value of Out is as follows:\n\n- AVERAGE: [2, 4, 3], where 2=(1+3)/2, 4=(2+4+6)/3, 3=(5+1)/2\n- SUM: [4, 12, 6], where 4=1+3, 12=2+4+6, 6=5+1\n- SQRT: [2.82, 6.93, 4.24], where 2.82=(1+3)/sqrt(2),\n           6.93=(2+4+6)/sqrt(3), 4.24=(5+1)/sqrt(2)\n- MAX: [3, 6, 5], where 3=max(1,3), 6=max(2,4,6), 5=max(5,1)\n- LAST: [3, 6, 1], where 3=last(1,3), 6=last(2,4,6), 1=last(5,1)\n- FIRST: [1, 2, 5], where 1=first(1,3), 2=first(2,4,6), 5=first(5,1)\n\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) The variable-length input of SequencePoolOp",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output of SequencePoolOp does not contain LoD infomation.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MaxIndex",
   "comment" : "(Tensor<int>) This tensor is used for the sequence max-pooling to record the max indexes.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "pooltype",
   "type" : "string",
   "comment" : "(int, default AVERAGE) the pooling pooltype of SequencePoolOp.",
   "generated" : 0
 } ] 
},{
 "type" : "lstm",
 "comment" : "\nLong-Short Term Memory (LSTM) Operator.\n\nThe defalut implementation is diagonal/peephole connection\n(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:\n\n$$\ni_t = \\sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i) \\\\\n\nf_t = \\sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f) \\\\\n\n\\tilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c) \\\\\n\no_t = \\sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o) \\\\\n\nc_t = f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t} \\\\\n\nh_t = o_t \\odot act_h(c_t)\n$$\n\nwhere the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix\nof weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$\nare diagonal weight matrices for peephole connections. In our implementation,\nwe use vectors to reprenset these diagonal weight matrices. The b terms\ndenote bias vectors ($b_i$ is the input gate bias vector), $\\sigma$\nis the non-line activations, such as logistic sigmoid function, and\n$i, f, o$ and $c$ are the input gate, forget gate, output gate,\nand cell activation vectors, respectively, all of which have the same size as\nthe cell output activation vector $h$.\n\nThe $\\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$\nare the cell input and cell output activation functions and `tanh` is usually\nused for them. $\\tilde{c_t}$ is also called candidate hidden state,\nwhich is computed based on the current input and the previous hidden state.\n\nSet `use_peepholes` False to disable peephole connection. The formula\nis omitted here, please refer to the paper\nhttp://www.bioinf.jku.at/publications/older/2604.pdf for details.\n\nNote that these $W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}$\noperations on the input $x_{t}$ are NOT included in this operator.\nUsers can choose to use fully-connect operator before LSTM operator.\n\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(LoDTensor) the first input is a LodTensor, which support variable-time length input sequence. The underlying tensor in this LoDTensor is a matrix with shape (T X 4D), where T is the total time steps in this mini-batch, D is the hidden size.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "H0",
   "comment" : "(Tensor, optional) the initial hidden state is an optional input. This is a tensor with shape (N x D), where N is the batch size and D is the hidden size.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "C0",
   "comment" : "(Tensor, optional) the initial cell state is an optional input. This is a tensor with shape (N x D), where N is the batch size. `H0` and `C0` can be NULL but only at the same time",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "(Tensor) the learnable hidden-hidden weights. - The shape is (D x 4D), where D is the hidden size.  - Weight = {W_ch, W_ih, W_fh, W_oh}",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "(Tensor) the learnable weights, which contains two parts: input-hidden bias weight and peephole connections weight if setting `use_peepholes` True. 1. `use_peepholes = False`  - The shape is (1 x 4D).  - Bias = {b_c, b_i, b_f, b_o}.2. `use_peepholes = True`  - The shape is (1 x 7D).  - Bias = {b_c, b_i, b_f, b_o, W_ic, W_fc, W_oc}.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Hidden",
   "comment" : "(LoDTensor) the hidden state of LSTM operator. The shape is (T x D), and lod is the same with the `Input`.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Cell",
   "comment" : "(LoDTensor) the cell state of LSTM operator. The shape is (T x D), and lod is the same with the `Input`.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "BatchGate",
   "comment" : "(LoDTensor) This LoDTensor contains input gate, forget gate and output gate after the nonlinear computation. This LoDTensor has the same shape as the reorganized input, which is also be called batch input. The LoD size is 2. The first LoD is the batch offsets and the second LoD contains the indexes, which denote the position of reorganized sequence in the raw input.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "BatchCellPreAct",
   "comment" : "(LoDTensor) This LoDTensor is obtained in the forward and used in the backward.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "use_peepholes",
   "type" : "bool",
   "comment" : "(bool, defalut: True) whether to enable diagonal/peephole connections.",
   "generated" : 0
 }, { 
   "name" : "is_reverse",
   "type" : "bool",
   "comment" : "(bool, defalut: False) whether to compute reversed LSTM.",
   "generated" : 0
 }, { 
   "name" : "gate_activation",
   "type" : "string",
   "comment" : "(string, default: sigmoid)The activation for input gate, forget gate and output gate, `sigmoid` by default.",
   "generated" : 0
 }, { 
   "name" : "cell_activation",
   "type" : "string",
   "comment" : "(string, default: tanh)The activation for cell output, `tanh` by defalut.",
   "generated" : 0
 }, { 
   "name" : "candidate_activation",
   "type" : "string",
   "comment" : "(string, default: tanh)The activation for candidate hidden state, `tanh` by default.",
   "generated" : 0
 } ] 
},{
 "type" : "conv3d_transpose",
 "comment" : "\nConvolution3D Transpose Operator.\n\nThe convolution transpose operation calculates the output based on the input, filter\nand strides, paddings, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCDHW format. Where N is batch size, C is the\nnumber of channels, D is the depth of the feature, H is the height of the feature,\nand W is the width of the feature.\nFilter(Input) is in MCDHW format. Where M is the number of input feature channels,\nC is the number of output feature channels, D is the depth of the filter,H is the\nheight of the filter, and W is the width of the filter.\nParameters(strides, paddings) are three elements. These three elements represent\ndepth, height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:   \n  Input:\n       Input shape: $(N, C_{in}, D_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{in}, C_{out}, D_f, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out} = (D_{in} - 1) * strides[0] - 2 * paddings[0] + D_f \\\\\n       H_{out} = (H_{in} - 1) * strides[1] - 2 * paddings[1] + H_f \\\\\n       W_{out} = (W_{in} - 1) * strides[2] - 2 * paddings[2] + W_f\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution transpose operator.The format of input tensor is NCDHW. Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution transpose operator.The format of the filter tensor is MCDHW, where M is the number of input feature channels, C is the number of output feature channels, D is the depth of the filter, H is the height of the filter, and W is the width of the filter.We enforce groups number == 1 and padding == 0 in the convolution3d transpose scenario.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution transpose operator.The format of output tensor is also NCDHW.Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1, 1}), the strides{d_stride, h_stride, w_stride} of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0, 0}), paddings(d_pad, h_pad, w_pad) of convolution transpose operator.",
   "generated" : 0
 } ] 
},{
 "type" : "conv2d_transpose",
 "comment" : "\nConvolution2D Transpose Operator.\n\nThe convolution transpose operation calculates the output based on the input, filter\nand strides, paddings, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCHW format. Where N is batchsize, C is the\nnumber of channels, H is the height of the feature, and W is the width of the feature.\nFilter(Input) is in MCHW format. Where M is the number of input feature channels,\nC is the number of output feature channels, H is the height of the filter,\nand W is the width of the filter.\nParameters(strides, paddings) are two elements. These two elements represent height\nand width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{in}, C_{out}, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       H_{out} = (H_{in} - 1) * strides[0] - 2 * paddings[0] + H_f \\\\\n       W_{out} = (W_{in} - 1) * strides[1] - 2 * paddings[1] + W_f\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution transpose operator. The format of input tensor is NCHW. Where N is batch size, C is the number of input channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution transpose operator. The format of the filter tensor is MCHW, where M is the number of input feature channels, C is the number of output feature channels,H is the height of the filter, and W is the width of the filter. We enforce groups number == 1 in the convolution transpose scenario.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution transpose operator. The format of output tensor is also NCHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the strides(h_stride, w_stride) of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution transpose operator.",
   "generated" : 0
 } ] 
},{
 "type" : "gru",
 "comment" : "\nGRU Operator implements part calculations of the complete GRU as following:\n\n\\f[\nupdate \\ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\\\\nreset \\ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r)  \\\\\noutput \\ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\\\\noutput: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)\n\\f]\n\n@note To implement the complete GRU, fully-connected operator must be used  \nbefore to feed xu, xr and xc as the Input of GRU operator.\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(LoDTensor) The first input is a LodTensor, which supports variable-time length input sequence. The underlying tensor in this LoDTenosr is a matrix with shape (T X 3D), where, T is the total time steps in this mini-batch, D is the hidden size.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "H0",
   "comment" : "(Tensor, optional) The initial hidden state is an optional input. This is a tensor with shape (N x D), where N is the batch size, D is the hidden size.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "(Tensor) The learnable hidden-hidden weight matrix with shape (D x 3D), where D is the hidden size. The elements continuous in memory can be divided into two parts. The first part are weights of the update gate and reset gate with shape (D x 2D), and the second part are weights of output candidate with shape (D x D).",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "(Tensor, optional) Bias vector with shape (1 x 3D) concating bias of the update gate, reset gate and output candidate.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "BatchGate",
   "comment" : "(LoDTensor) To compute with batches, sequence data will be reorganized into several successive batches each containing data from the same time step. The LoDTensor BatchGate contains the update gate, reset gate and output candidate values organized in batches. The LoD size is 2. The first LoD contains the batch offsets and the second LoD contains the indexes in the raw sequence data.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "BatchResetHiddenPrev",
   "comment" : "(LoDTensor) The reseted hidden state LoDTensor organized in batches. This LoDTensor is a matrix with shape (T X D) and has the same LoD with `BatchGate`.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "BatchHidden",
   "comment" : "(LoDTensor) The hidden state LoDTensor organized in batches.  This LoDTensor is a matrix with shape (T X D) and has the same LoD with `BatchGate`.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Hidden",
   "comment" : "(LoDTensor) the hidden state LoDTensor organized in sequences. This LoDTensor is a matrix with shape (T X D) and has the same LoD with `BatchGate`.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "activation",
   "type" : "string",
   "comment" : "(string, default tanh) The activation type used for output candidate {h}_t.",
   "generated" : 0
 }, { 
   "name" : "gate_activation",
   "type" : "string",
   "comment" : "(string, default sigmoid) The activation type used in update gate and reset gate.",
   "generated" : 0
 }, { 
   "name" : "is_reverse",
   "type" : "bool",
   "comment" : "(bool, defalut: False) whether to compute reversed GRU.",
   "generated" : 0
 } ] 
},{
 "type" : "recurrent",
 "comment" : "\nStatic Length Recurrent Operator.\n\nThe static length recurrent operator can only operate on fixed size sequence\ndata, i.e. in each mini-batch, the sequence length of all inputs are the same.\n\n",
 "inputs" : [ 
 { 
   "name" : "inputs",
   "comment" : "rnn inputs",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "initial_states",
   "comment" : "rnn initial states",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "parameters",
   "comment" : "Parameters are used by step block as its input. However, the input is not a sequence tensor. Every time step, each operator in step block just use the parameter directly.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "outputs",
   "comment" : "The output sequence of RNN. The sequence length must be same.",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "step_scopes",
   "comment" : "StepScopes contain all local variables in each time step.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "ex_states",
   "type" : "string array",
   "comment" : "The ex-state variable names.\nThe ex-state means the state value in the ex-timestep or the previous time step\n[ex_states, states, initial_states@GRAD] must be the same order",
   "generated" : 0
 }, { 
   "name" : "states",
   "type" : "string array",
   "comment" : "The state variable names. [ex_states, states, initial_states@GRAD] must be the same order",
   "generated" : 0
 }, { 
   "name" : "step_block",
   "type" : "block id",
   "comment" : "The step block inside RNN",
   "generated" : 0
 }, { 
   "name" : "reverse",
   "type" : "bool",
   "comment" : "Calculate RNN reversely or not.\nBy default reverse=False\n\nAssume the input data is [A, B, C, D]\n\nif reverse is False:\n  the computation of RNN is like\n      A          B          C         D\n      |          |          |         |\n      v          v          v         v\n     rnn -----> rnn -----> rnn ----> rnn\n      |          |          |         |\n      v          v          v         v\n      o          o          o         o\n\nif reverse is True\n  the computation of RNN is like\n      A          B          C         D\n      |          |          |         |\n      v          v          v         v\n     rnn <----- rnn <----- rnn <---- rnn\n      |          |          |         |\n      v          v          v         v\n      o          o          o         o\n",
   "generated" : 0
 }, { 
   "name" : "is_train",
   "type" : "bool",
   "comment" : "",
   "generated" : 0
 } ] 
},{
 "type" : "save",
 "comment" : "\nSave operator\n\nThis operator will serialize and write a tensor variable to file on disk.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor ) Input tensor to be saved",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [  ], 
 "attrs" : [ 
 { 
   "name" : "overwrite",
   "type" : "bool",
   "comment" : "(boolean, default true)Overwrite the output file if exist",
   "generated" : 0
 }, { 
   "name" : "file_path",
   "type" : "string",
   "comment" : "(string)The \"file_path\" where the variable will be saved.",
   "generated" : 0
 } ] 
},{
 "type" : "load",
 "comment" : "\nLoad Operator.\n\nLoad operator will load a tensor variable from disk file.\n\n",
 "inputs" : [  ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The tensor need to be loaded",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "file_path",
   "type" : "string",
   "comment" : "(string) Variable will be loaded from \"file_path\".",
   "generated" : 0
 } ] 
},{
 "type" : "auc",
 "comment" : "\nArea Under The Curve (AUC) Operator.\n\nThis implementation computes the AUC according to forward output and label.\nIt is used very widely in binary classification evaluation. As a note:\nIf input label contains values other than 0 and 1, it will be cast\nto bool. You can find the relevant definitions here:\nhttps://en.wikipedia.org/wiki/Receiver_operating_characteristic#Area_under_the_curve\n\nThere are two types of possible curves:\n1. ROC: Receiver operating characteristic\n2. PR: Precision Recall\n",
 "inputs" : [ 
 { 
   "name" : "Out",
   "comment" : "A floating point 2D tensor, values are in the range [0, 1].Each row is sorted in descending order. This input should be theoutput of topk.Typically, this tensor indicates the probability of each label",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Indices",
   "comment" : "An int 2D tensor, indicating the indices of originaltensor before sorting. Typically, this tensor indicates which label the probability stands for.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "A 2D int tensor indicating the label of the training data.The height is batch size and width is always 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "AUC",
   "comment" : "A scalar representing the current area-under-the-curve.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "curve",
   "type" : "string",
   "comment" : "Curve type, can be 'ROC' or 'PR'.",
   "generated" : 0
 }, { 
   "name" : "num_thresholds",
   "type" : "int",
   "comment" : "The number of thresholds to use when discretizing the roc curve.",
   "generated" : 0
 } ] 
},{
 "type" : "hard_sigmoid",
 "comment" : "\nHardSigmoid Activation Operator.\n\nSegment-wise linear approximation of sigmoid(https://arxiv.org/abs/1603.00391), \nwhich is much faster than sigmoid.\n\n$y = \\max(0, \\min(1, slope * x + shift))$\n\nThe slope should be positive. The offset can be either positive or negative.\nThe default slope and shift are set according to the above reference.\nIt is recommended to use the defaults for this activation.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of HardSigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of HardSigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "slope",
   "type" : "float",
   "comment" : "Slope for linear approximation of sigmoid",
   "generated" : 0
 }, { 
   "name" : "offset",
   "type" : "float",
   "comment" : "Offset for linear approximation of sigmoid",
   "generated" : 0
 } ] 
},{
 "type" : "cond",
 "comment" : "\nSample Dependent Conditional Operator.\n\nGiven Cond[i] as a 1/0 vector to indicate true/false:\nOut[i] = subnet_true[i], if Cond[i] == true\nOut[i] = subnet_false[i], if Cond[i] == false\n\n",
 "inputs" : [ 
 { 
   "name" : "Cond",
   "comment" : "The condition, which is a bool vector",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Xs",
   "comment" : "Inputs of Subnets",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Outs",
   "comment" : "Outputs of Cond_Op after merge",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "SubScopes",
   "comment" : "sub scopes for true and false branches",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "IndexTensors",
   "comment" : "Index Tensors contains indices for true/false",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "max_pool2d_with_index",
 "comment" : "\nMaxPool2d Operator.\n\nThe maxPooling2d with index operation calculates the output and the mask\nbased on the input, ksize, strides, and paddings parameters. Input(X) and\noutput(Out, Mask) are in NCHW format, where N is batch size, C is the\nnumber of channels, H is the height of the feature, \nand W is the width of the feature.\nParameters(ksize, strides, paddings) are two elements.\nThese two elements represent height and width, respectively.\nThe input(X) size and output(Out, Mask) size may be different.\n\nExample:\n  Input:\n       X shape: $(N, C, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, H_{out}, W_{out})$\n       Mask shape: $(N, C, H_{out}, W_{out})$\n  Where\n       $$\n       H_{out} = \\frac{(H_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1\n       $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCHW, where N is batch size, C is the number of channels, H is the height of the image, and W is the width of the image.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator. The format of output tensor is also NCHW, where N is batch size, C is the number of channels, H is the height of the image and W is the width of the image.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mask",
   "comment" : "(Tensor) The Mask tensor of pooling operator.The format of output tensor is also NCHW, where N is batch size, C is the number of channels, H is the height of the image, and W is the width of the image. It represents the index in the current feature map.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default:false) Whether to use the global pooling. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1, 1}), strides(height, width) of pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default:{0, 0}), paddings(height, width) of pooling operator. If global_pooling = true, paddings and will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "thresholded_relu",
 "comment" : "\nThresholdedRelu Activation Operator.\n\n$$\ny = \\begin{cases} \n    x, \\text{if } x > threshold \\\\\n    0,  \\text{otherwise}\n    \\end{cases}\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of ThresholdedRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of ThresholdedRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "threshold",
   "type" : "float",
   "comment" : "The threshold location of activation",
   "generated" : 0
 } ] 
},{
 "type" : "hard_shrink",
 "comment" : "\nHardShrink Activation Operator.\n\n$$\ny = \\begin{cases} \n    x, \\text{if } x > \\lambda \\\\\n    x, \\text{if } x < -\\lambda \\\\\n    0,  \\text{otherwise}\n    \\end{cases}\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of HardShrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of HardShrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "threshold",
   "type" : "float",
   "comment" : "The value of threshold for HardShrink",
   "generated" : 0
 } ] 
},{
 "type" : "relu6",
 "comment" : "\nRelu6 Activation Operator.\n\n$y = \\min(\\max(0, x), 6)$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Relu6 operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Relu6 operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "threshold",
   "type" : "float",
   "comment" : "The threshold value of Relu6",
   "generated" : 0
 } ] 
},{
 "type" : "elu",
 "comment" : "\nELU Activation Operator.\n\nApplies the following element-wise computation on the input according to\nhttps://arxiv.org/abs/1511.07289.\n\n$y = \\max(0, x) + \\min(0, \\alpha * (e^x - 1))$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of ELU operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of ELU operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "alpha",
   "type" : "float",
   "comment" : "The alpha value of ELU",
   "generated" : 0
 } ] 
},{
 "type" : "leaky_relu",
 "comment" : "\nLeakyRelu Activation Operator.\n\n$y = \\max(x, \\alpha * x)$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of LeakyRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of LeakyRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "alpha",
   "type" : "float",
   "comment" : "The small negative slope",
   "generated" : 0
 } ] 
},{
 "type" : "top_k",
 "comment" : "\nTop K operator\n\nIf the input is a vector (1d tensor), this operator finds the k largest \nentries in the vector and outputs their values and indices as vectors. \nThus values[j] is the j-th largest entry in input, and its index is indices[j].\n\nFor matrices, this operator computes the top k entries in each row. ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input of Topk op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of Topk op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Indices",
   "comment" : "(Tensor) The indices of Topk elements of input",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "k",
   "type" : "int",
   "comment" : "(int, default 1) Number of top elements to look for along the last dimension (along each row for matrices).",
   "generated" : 0
 } ] 
},{
 "type" : "sequence_softmax",
 "comment" : "\nSequence Softmax Operator.\n\nSequenceSoftmaxOp computes the softmax activation among all time-steps for each\nsequence. The dimension of each time-step should be 1. Thus, the shape of\ninput Tensor can be either [N, 1] or [N], where N is the sum of the length\nof all sequences.\n\nThe algorithm works as follows:\n    for i-th sequence in a mini-batch:\n        $$Out(X[lod[i]:lod[i+1]], :) =\n            \\frac{\\exp(X[lod[i]:lod[i+1], :])}\n            {\\sum(\\exp(X[lod[i]:lod[i+1], :]))}$$\n\nFor example, for a mini-batch of 3 sequences with variable-length,\neach containing 2, 3, 2 time-steps, the lod of which is [0, 2, 5, 7],\nthen softmax will be computed among X[0:2, :], X[2:5, :], X[5:7, :]\nand N turns out to be 7.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) 1-D or 2-D input LoDTensor with the 2-nd dimension of length 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) 1-D or 2-D output LoDTensor with the 2-nd dimension of length 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "decayed_adagrad",
 "comment" : "\nDecayed Adagrad Optimizer.\n\nThe update is done as follows:\n\n$$\nmoment\\_out = decay * moment + (1 - decay) * grad * grad \\\\\nparam\\_out = param - \\frac{learning\\_rate * grad}{\\sqrt{moment\\_out} + epsilon}\n$$\n\nThe original paper(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)\ndoes not have an epsilon attribute. It is added here for numerical\nstability to avoid the division by zero error.\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment",
   "comment" : "(Tensor) Second moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor) Learning rate",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MomentOut",
   "comment" : "(Tensor) Output second moment",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "decay",
   "type" : "float",
   "comment" : "(float, default 0.95) Discounting factor for coming gradient",
   "generated" : 0
 }, { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1.0e-6) Constant for numerical stability",
   "generated" : 0
 } ] 
},{
 "type" : "scale",
 "comment" : "\nScale operator\n\n$$Out = scale*X$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) Input tensor of scale operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) Output tensor of scale operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "scale",
   "type" : "float",
   "comment" : "(float, default 0)The scaling factor of the scale operator.",
   "generated" : 0
 } ] 
},{
 "type" : "increment",
 "comment" : "\nIncrement Operator.\n\nThe equation is: \n$$Out = X + step$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of increment operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of increment operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "step",
   "type" : "float",
   "comment" : "(float, default 1.0) The step size by which the input tensor will be incremented.",
   "generated" : 0
 } ] 
},{
 "type" : "expand",
 "comment" : "\nExpand operator tiles the input by given times number. You should set times\nnumber for each dimension by providing attribute 'expand_times'. The rank of X\nshould be in [1, 6]. Please notice that size of 'expand_times' must be same with\nX's rank. Following is a using case:\n\nInput(X) is a 3-D tensor with shape [2, 3, 1]:\n\n        [\n           [[1], [2], [3]],\n           [[4], [5], [6]]\n        ]\n\nAttr(expand_times):  [1, 2, 2]\n\nOutput(Out) is a 3-D tensor with shape [2, 6, 2]:\n\n        [\n            [[1, 1], [2, 2], [3, 3], [1, 1], [2, 2], [3, 3]],\n            [[4, 4], [5, 5], [6, 6], [4, 4], [5, 5], [6, 6]]\n        ]\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank in [1, 6].X is the input tensor to be expanded.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank in [1, 6].The rank of Output(Out) is same as Input(X) except that each dimension size of Output(Out) is equal to corresponding dimension size of Input(X) multiplying corresponding value of Attr(expand_times).",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "expand_times",
   "type" : "int array",
   "comment" : "Expand times number for each dimension.",
   "generated" : 0
 } ] 
},{
 "type" : "lod_array_length",
 "comment" : "\nLoDArrayLength Operator.\n\nThis operator obtains the length of lod tensor array:\n\n$$Out = len(X)$$\n\nNOTE: The output is a CPU Tensor since the control variable should be only in\nCPU and the length of LoDTensorArray should be used as control variables.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensorArray) The input tensor array.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) 1x1 CPU Tensor of length, int64_t",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "reduce_sum",
 "comment" : "\n{ReduceOp} Operator.\n\nThis operator computes the sum of input tensor along the given dimension. \nThe result tensor has 1 fewer dimension than the input unless keep_dim is true.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor. Tensors with rank at most 6 are supported.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The result tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dim",
   "type" : "int",
   "comment" : "(int, default 0) The dimension to reduce. Must be in the range [-rank(input), rank(input)). If `dim < 0`, the dim to reduce is `rank + dim`. Note that reducing on the first dim will make the LoD info lost.",
   "generated" : 0
 }, { 
   "name" : "keep_dim",
   "type" : "bool",
   "comment" : "(bool, default false) If true, retain the reduced dimension with length 1.",
   "generated" : 0
 } ] 
},{
 "type" : "tanh_shrink",
 "comment" : "\nTanhShrink Activation Operator.\n\n$$y = x - \\frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of TanhShrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of TanhShrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "adam",
 "comment" : "\nAdam Optimizer.\n\nThis implements the Adam optimizer from Section 2 of the Adam\npaper : https://arxiv.org/abs/1412.6980.\nAdam is a first-order gradient-based optimization method based on\nadaptive estimates of lower-order moments.\n\nAdam updates:\n\n$$\nmoment\\_1\\_out = \\beta_1 * moment\\_1 + (1 - \\beta_1) * grad \\\\\nmoment\\_2_\\out = \\beta_2 * moment\\_2 + (1 - \\beta_2) * grad * grad \\\\\nlearning\\_rate = learning\\_rate *\n                  \\frac{\\sqrt{1 - \\beta_{2\\_pow}}}{1 - \\beta_{1\\_pow}} \\\\\nparam\\_out = param - learning\\_rate * \\frac{moment\\_1}{\\sqrt{moment\\_2} + \\epsilon}\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor) Learning rate",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment1",
   "comment" : "(Tensor) Input first moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment2",
   "comment" : "(Tensor) Input second moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Beta1Pow",
   "comment" : "(Tensor) Input beta1 power accumulator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Beta2Pow",
   "comment" : "(Tensor) Input beta2 power accumulator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment1Out",
   "comment" : "(Tensor) Output first moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment2Out",
   "comment" : "(Tensor) Output second moment",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "beta1",
   "type" : "float",
   "comment" : "(float, default 0.9) Exponential decay rate for the first moment estimates.",
   "generated" : 0
 }, { 
   "name" : "beta2",
   "type" : "float",
   "comment" : "(float, default 0.999) exponential decay rate for the second moment estimates.",
   "generated" : 0
 }, { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1.0e-8) Constant for numerical stability",
   "generated" : 0
 } ] 
},{
 "type" : "reduce_min",
 "comment" : "\n{ReduceOp} Operator.\n\nThis operator computes the min of input tensor along the given dimension. \nThe result tensor has 1 fewer dimension than the input unless keep_dim is true.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor. Tensors with rank at most 6 are supported.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The result tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dim",
   "type" : "int",
   "comment" : "(int, default 0) The dimension to reduce. Must be in the range [-rank(input), rank(input)). If `dim < 0`, the dim to reduce is `rank + dim`. Note that reducing on the first dim will make the LoD info lost.",
   "generated" : 0
 }, { 
   "name" : "keep_dim",
   "type" : "bool",
   "comment" : "(bool, default false) If true, retain the reduced dimension with length 1.",
   "generated" : 0
 } ] 
},{
 "type" : "lod_reset",
 "comment" : "LoDReset operator\n\nReset LoD of Input(X) into a new one specified by Input(TargetLoD) or\nAttr(target_lod), or set LoD for Input(X) if it doesn't have one.\nCurrently the lod_reset operator only supports the reset of level 0 LoD.\nAt least one of Input(TargetLoD) and Attr(target_lod) must be set,\nand if both of them are set, Input(TargetLoD) will be chosen as the\ntarget LoD.\n\nAn example:\nGiven a float LoDTensor X with shape (6, 1), its transpose form represents\n\n    [1.0, 2.0, 3.0, 4.0, 5.0, 6.0],\n\nwith LoD = [[0, 2, 5, 6]] and the three (transposed) sequences look like\n\n    [1.0, 2.0], [3.0, 4.0, 5.0], [6.0].\n\nIf target LoD = [0, 4, 6], the lod_reset operator will reset the LoD and\nthe sequences that the LoDTensor Output(Out) contains becomes:\n\n    [1.0, 2.0, 3.0, 4.0], [5.0, 6.0].\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) The input tensor of lod_reset operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "TargetLoD",
   "comment" : "(Tensor, optional) The target level 0 LoD from Input().",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) The output tensor of lod_reset operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "target_lod",
   "type" : "int array",
   "comment" : "The target level 0 LoD from Attr().",
   "generated" : 0
 } ] 
},{
 "type" : "write_to_array",
 "comment" : "\nWriteToArray Operator.\n\nThis operator writes a LoDTensor to a LoDTensor array.\n\nAssume $T$ is LoDTensor, $i$ is the subscript of the array, and $A$ is the array. The\nequation is\n\n$$A[i] = T$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the tensor will be written to tensor array",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "I",
   "comment" : "(Tensor) the subscript index in tensor array. The number of element should be 1",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(TensorArray) the tensor array will be written",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "reshape",
 "comment" : "\nReshape Operator.\n\nReshape Input(X) into the shape specified by Attr(shape).\n\nAn example:\nGiven a 2-D tensor X with 2 rows and 2 columns\n\n    [[1, 2], [3, 4]]\n\nand target shape = [1, 4], the reshape operator will transform\nthe tensor X into a 1-D tensor:\n\n    [1, 2, 3, 4]\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor of reshape operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output tensor of reshape operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "(vector<int>) Target shape of reshape operator.",
   "generated" : 0
 } ] 
},{
 "type" : "fill_constant",
 "comment" : "\nFillConstantBatchSizeLike Operator.\n\nFill up a variable with specified constant value.\n\n",
 "inputs" : [  ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) Tensor of specified shape will be filled with the specified value",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5 (FP32)) Output data type",
   "generated" : 0
 }, { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "(vector<int>) The shape of the output",
   "generated" : 0
 }, { 
   "name" : "value",
   "type" : "float",
   "comment" : "(float, default 0) The value to be filled",
   "generated" : 0
 }, { 
   "name" : "force_cpu",
   "type" : "bool",
   "comment" : "(bool, default false) Force fill output variable to cpu memory. Otherwise, fill output variable to the running device",
   "generated" : 0
 } ] 
},{
 "type" : "elementwise_div",
 "comment" : "\nLimited Elementwise Div Operator.\n\nThe equation is:\n\n$Out = X / Y$\n\nX is a tensor of any dimension and the dimensions of tensor Y must be smaller than\nor equal to the dimensions of X. \n\nThere are two cases for this operator:\n1. The shape of Y is same with X;\n2. The shape of Y is a subset of X.\n\nFor case 2:\nY will be broadcasted to match the shape of X and axis should be \nthe starting dimension index for broadcasting Y onto X.\n\nexample:\n  shape(X) = (2, 3, 4, 5), shape(Y) = (,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (5,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1\n  shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0\n\nBoth the input X and Y can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The first input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor) The second input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default -1) The starting dimension index for broadcasting Y onto X",
   "generated" : 0
 } ] 
},{
 "type" : "conv2d_cudnn",
 "comment" : "\nConvolution Operator.\n\nThe convolution operation calculates the output based on the input, filter\nand strides, paddings, dilations, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and Output(Output) are in NCHW format. Where N is batch\nsize, C is the number of channels, H is the height of the feature, and W is\nthe width of the feature.\nFilters(Input) is MCHW format. Where M is the number of output image channels, C is\nthe number of input image channels, H is the height of the filter, and W\nis the width of the filter.\nParameters(strides, paddings, dilations) are two elements. These two elements represent\nheight and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{out}, C_{in}, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, H_{out}, W_{out})$\n  Where\n$$\n       H_{out}= \\frac{(H_{in} + 2 * paddings[0] - (dilations[0] * (H_f - 1) + 1))}{strides[0]}+ 1 \\\\\n       W_{out}= \\frac{(W_{in} + 2 * paddings[1] - (dilations[1] * (W_f - 1) + 1))}{strides[1]}+ 1\n$$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution operator. The format of input tensor is NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution operator. The format of the filter tensor is MCHW, where M is the number of output image channels, C is the number of input image channels, H is the height of the filter, and W is the width of the filter. If the groups attribute is greater than 1, C equals the number of input image channels divided by the groups.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution operator. The format of output tensor is also NCHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the strides(h_stride, w_stride) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "groups",
   "type" : "int",
   "comment" : "(int default:1), the groups number of the convolution operator. According to grouped convolution in Alex Krizhevsky's Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the dilations(h_dilation, w_dilation) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "workspace_size_MB",
   "type" : "int",
   "comment" : "workspace size for cudnn, in MB, workspace is a section of GPU memory which will be allocated/freed each time the operator runs, larger workspace size can increase performance but also requires better hardware. This size should be chosen carefully.",
   "generated" : 0
 } ] 
},{
 "type" : "mul",
 "comment" : "\nMul Operator. \n\nThis operator is used to perform matrix multiplication for input X and Y.\n\nThe equation is:\n\n    $$Out = X * Y$$\n\nBoth the input `X` and `Y` can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input `X`.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The first input of mul op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The second input of mul op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of mul op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "x_num_col_dims",
   "type" : "int",
   "comment" : "(int, default 1) mul_op can take tensors with more than two dimensions as input `X`,\n            in that case, tensors will be reshaped to a matrix. The matrix's first\n            dimension(column length) will be the product of tensor's last\n            `num_col_dims` dimensions, and the matrix's second dimension(row length)\n            will be the product of tensor's first `rank - num_col_dims` dimensions.\n        ",
   "generated" : 0
 }, { 
   "name" : "y_num_col_dims",
   "type" : "int",
   "comment" : "(int, default 1) mul_op can take tensors with more than two dimensions as input `Y`,\n             in that case, tensors will be reshaped to a matrix. Just like input `X`.\n        ",
   "generated" : 0
 } ] 
},{
 "type" : "margin_rank_loss",
 "comment" : "\nMarginRankLoss Operator.\n\nThis operator measures the loss given a pair of training sample\n{`X1`, `X2`} and the `Label` with attribute `margin`, where `Label = +1` \nindicating X1 is ranked higher than `X2` and `Label = -1` otherwise. The loss \nis calculated as:\n\n$loss(X1, X2, Label) = \\max(0, -Label * (X1 - X2) + margin)$\n\nThe attribute `margin` here helps make the predictions more robust.\nDenote the item ranked higher as the positive sample, otherwise the negative \nsample. If the score of the two samples satisfies \n\n$positive sample - negative sample < margin$\n\nthe pair of samples will contribute to the final loss, which will backpropagate \nand train the ranking model to enlarge the difference between the two scores.\n\nFor batch input with size `batch_size`, `X1`, `X2` and `Label`\nall have the same shape [batch_size x 1].\n\n",
 "inputs" : [ 
 { 
   "name" : "X1",
   "comment" : "(2-D tensor with shape [batch_size x 1]) The score for one item X1 to be ranked, from pairwise ranking model.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "X2",
   "comment" : "(2-D tensor with shape [batch_size x 1]) The score for another item X2 to be ranked, from pairwise ranking model.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(2-D tensor with shape [batch_size x 1]) The label indicating X1 ranked higher than X2 or not, can only be +1 or -1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Activated",
   "comment" : "(2-D tensor with shape [batch_size x 1]) Intermediate tensor to indicate whether each element of Output(Out) is activated.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Out",
   "comment" : "(2-D tensor with shape [batch_size x 1]) The output loss of MarginRankLoss operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "margin",
   "type" : "float",
   "comment" : "(scalar, default 0) Margin for MarginRankLossOp.",
   "generated" : 0
 } ] 
},{
 "type" : "greater_equal",
 "comment" : "greater_equal Operator\n\nIt operates element-wise on X and Y, and returns the Out. Each of them is a\nN-dim tensor. X and Y could be any type.  The each element of the Out tensor is\ncalculated by Out = X >= Y\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the left hand operand of greater_equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) the right hand operand of greater_equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is Out = X >= Y",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "reciprocal",
 "comment" : "\nReciprocal Activation Operator.\n\n$$y = \\frac{1}{x}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Reciprocal operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Reciprocal operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "squared_l2_norm",
 "comment" : "\nSquaredL2Norm Operator.\n\nComputes the squared L2 norm of a tensor.\n\n$$Out = \\sum_{i} X_{i}^2$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input of squared_l2_norm op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Scalar) The output of squared_l2_norm op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "shrink_rnn_memory",
 "comment" : "\n        In dynamic RNN, we are able to handle sequences of different lengths. \n        Because of the multiple lengths, the size of each step input can be \n        different, which may lead to a mismatching between the input of\n        the current step and the memory generated by the previous one. This \n        operator shrinks memory according to the size of the next step input, \n        to make sure that they can match each other.\n        ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) The RNN step memory to be shrinked.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "RankTable",
   "comment" : "(LoDRankTable) The lod_rank_table of dynamic RNN.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "I",
   "comment" : "(LoDTensor) The step index. The RNN step memory 'X' will be shrinked to match the size of the input of the index'th step.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) The shrinked RNN step memory.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "conditional_block",
 "comment" : "Conditional block operator\n\nRun the sub-block if X is not empty. Params is the other inputs and Out is the\noutputs of the sub-block.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The conditional variable of this operator. If X is empty, the whole sub-block will not be executed.",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "Params",
   "comment" : "The input variables of the sub-block.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output variables of the sub-block.",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "Scope",
   "comment" : "(std::vector<Scope*>) The step scope of conditional block. To unify the conditional block, rnn and while op, the type of scope is std::vector<Scope*>",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "block",
   "type" : "block id",
   "comment" : "The step block of conditional block operator",
   "generated" : 0
 } ] 
},{
 "type" : "lookup_table",
 "comment" : "\nLookup Table Operator.\n\nThis operator is used to perform lookups on the parameter W,\nthen concatenated into a dense tensor.\n\nThe input Ids can carry the LoD (Level of Details) information,\nor not. And the output only shares the LoD information with input Ids.\n\n",
 "inputs" : [ 
 { 
   "name" : "W",
   "comment" : "An input represents embedding tensors, which is a learnable parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Ids",
   "comment" : "An input with type int32 or int64 contains the ids to be looked up in W. Ids must be a column vector with rank = 2. The 2nd dimension size must be 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The lookup results, which have the same type as W.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "is_sparse",
   "type" : "bool",
   "comment" : "(boolean, default false) Sparse update",
   "generated" : 0
 } ] 
},{
 "type" : "pad",
 "comment" : "\nPad Operator.\n\nPad input into output, as specified by paddings and pad_value. \nThe input should be a k-D tensor(k > 0 and k < 7). As an example:\n\nGiven:\n\nX = [[1, 2],\n     [3, 4]],\n\npaddings = [0, 1, 1, 2],\n\nand\n\npad_value = 0,\n\nwe have:\n\nOut = [[0, 1, 2, 0, 0]\n       [0, 3, 4, 0, 0]\n       [0, 0, 0, 0, 0]]\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of pad op. The input should be a k-D tensor(k > 0 and k < 7)",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of pad op. A tensor with the same shape as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>) A list<int> to describe the padding rules for each dimension. For 2-D image tensor, paddings=[0, 1, 2, 3] means padding 0 row to top, 1 row to bottom, 2 columns to left and 3 columns to right. Size of paddings should be equal to 2 * dimension size of the input tensor.",
   "generated" : 0
 }, { 
   "name" : "pad_value",
   "type" : "float",
   "comment" : "(float, default 0.0) The value to fill the padded areas.",
   "generated" : 0
 } ] 
},{
 "type" : "split_lod_tensor",
 "comment" : "\n        Split a LoDTensor with a Mask at certain level. The input LoDTensor\n        has 3 sequence at certain lod level. The Mask is a bool column vector,\n        such as [0, 1, 0] at the same level. The first and third sequence will\n        be send to False Output LoDTensor; whereas the second sequence will\n        be send to True Output LoDTensor. Please refer to MergeLoDTensorOp.",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input LoDTensor",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mask",
   "comment" : "A bool column vector which mask the input",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "OutTrue",
   "comment" : "True branch of input LoDTensor",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "OutFalse",
   "comment" : "False branch of input LoDTensor",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "level",
   "type" : "int",
   "comment" : "(int) the specific lod level to split.",
   "generated" : 0
 } ] 
},{
 "type" : "max_sequence_len",
 "comment" : "Calculate the max sequence length through lod_rank_table.",
 "inputs" : [ 
 { 
   "name" : "RankTable",
   "comment" : "The lod_rank_table.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The max sequence length.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "multiplex",
 "comment" : "\nMultiplex Operator.\n\nMultiplex multiple tensors according to the index provided by the index tensor.\n\nIds: the index tensor.\nX[0 : N - 1]: the candidate tensors for output (N >= 2).\nFor each index i from 0 to batchSize - 1, the output is the i-th row of the\nthe (Ids[i])-th tensor.\n\nFor i-th row of the output tensor:\n\n$$y[i] = x_{k}[i]$$\n\nwhere `y` is the output tensor, `x_{k}` is the k-th input tensor,\nand `k = Ids[i]`.\n\n",
 "inputs" : [ 
 { 
   "name" : "Ids",
   "comment" : "The index tensor of multiplex operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "X",
   "comment" : "The candidate tensors of multiplex operator.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output tensor of multiplex operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "stanh",
 "comment" : "\nSTanh Activation Operator.\n\n$$y = b * \\frac{e^{a * x} - e^{-a * x}}{e^{a * x} + e^{-a * x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of STanh operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of STanh operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "scale_a",
   "type" : "float",
   "comment" : "The scale parameter of a for the input",
   "generated" : 0
 }, { 
   "name" : "scale_b",
   "type" : "float",
   "comment" : "The scale parameter of b for the input",
   "generated" : 0
 } ] 
},{
 "type" : "adamax",
 "comment" : "\nAdamax Optimizer.\n\nWe implement the Adamax optimizer from Section 7 of the Adam\npaper: https://arxiv.org/abs/1412.6980. Adamax is a variant of the\nAdam algorithm based on the infinity norm.\n\nAdamax updates:\n\n$$\nmoment\\_out = \\beta_1 * moment + (1 - \\beta_1) * grad \\\\\ninf\\_norm\\_out = max(\\beta_2 * inf\\_norm + \\epsilon, |grad|) \\\\\nlearning\\_rate = \\frac{learning\\_rate}{1 - \\beta_{1\\_pow}} \\\\\nparam\\_out = param - learning\\_rate * \\frac{moment\\_out}{inf\\_norm\\_out}\n$$\n\nThe original paper does not have an epsilon attribute.\nHowever, it is added here for numerical stability to prevent the\ndivision by 0 error.\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor) Learning rate",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment",
   "comment" : "(Tensor) First moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "InfNorm",
   "comment" : "(Tensor) Input exponentially weighted infinity norm",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Beta1Pow",
   "comment" : "(Tensor) Input beta1 power accumulator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MomentOut",
   "comment" : "(Tensor) Output first moment",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "InfNormOut",
   "comment" : "(Tensor) Output exponentially weighted infinity norm",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "beta1",
   "type" : "float",
   "comment" : "(float, default 0.9) Exponential decay rate for the 1st moment estimates.",
   "generated" : 0
 }, { 
   "name" : "beta2",
   "type" : "float",
   "comment" : "(float, default 0.999) exponential decay rate for the weighted infinity norm estimates.",
   "generated" : 0
 }, { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1.0e-8) Constant for numerical stability",
   "generated" : 0
 } ] 
},{
 "type" : "l1_norm",
 "comment" : "\nL1 Norm Operator.\n\nComputes the L1 norm of a tensor.\n\n$$Out = \\sum{|X|}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input of l1_norm op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Scalar) The output of l1_norm op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "dropout",
 "comment" : "\nDropout Operator.\n\nDropout refers to randomly dropping out units in a nerual network. It is a\nregularization technique for reducing overfitting by preventing neuron\nco-adaption during training. The dropout operator randomly set (according to\nthe given dropout probability) the outputs of some units to zero, while others\nare set equal to their corresponding inputs.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of dropout op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of dropout op.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mask",
   "comment" : "The random sampled dropout mask.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "dropout_prob",
   "type" : "float",
   "comment" : "Probability of setting units to zero.",
   "generated" : 0
 }, { 
   "name" : "is_test",
   "type" : "bool",
   "comment" : "True if in test phase.",
   "generated" : 0
 }, { 
   "name" : "seed",
   "type" : "int",
   "comment" : "Dropout random seed.",
   "generated" : 0
 } ] 
},{
 "type" : "lod_tensor_to_array",
 "comment" : "",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "RankTable",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "pool2d_cudnn",
 "comment" : "\nPool2d Operator.\n\nThe pooling2d operation calculates the output based on\nthe input, pooling_type and ksize, strides, paddings parameters.\nInput(X) and output(Out) are in NCHW format, where N is batch size, C is the\nnumber of channels, H is the height of the feature, and W is the width of the feature.\nParameters(ksize, strides, paddings) are two elements.\nThese two elements represent height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:   \n  Input:\n       X shape: $(N, C, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, H_{out}, W_{out})$\n  Where\n       $$ \n       H_{out} = \\frac{(H_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1\n       $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator. The format of output tensor is also NCHW, where N is batch size, C is the number of channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "pooling_type",
   "type" : "string",
   "comment" : "(string), pooling type, can be \"max\" for max-pooling and \"avg\" for average-pooling.",
   "generated" : 0
 }, { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(height, width) of the pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default false) Whether to use the global pooling. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1, 1}), strides(height, width) of pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default {0,0}), paddings(height, width) of pooling operator.If global_pooling = true, paddings and ksize will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "conv2d_transpose_cudnn",
 "comment" : "\nConvolution2D Transpose Operator.\n\nThe convolution transpose operation calculates the output based on the input, filter\nand strides, paddings, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCHW format. Where N is batchsize, C is the\nnumber of channels, H is the height of the feature, and W is the width of the feature.\nFilter(Input) is in MCHW format. Where M is the number of input feature channels,\nC is the number of output feature channels, H is the height of the filter,\nand W is the width of the filter.\nParameters(strides, paddings) are two elements. These two elements represent height\nand width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{in}, C_{out}, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       H_{out} = (H_{in} - 1) * strides[0] - 2 * paddings[0] + H_f \\\\\n       W_{out} = (W_{in} - 1) * strides[1] - 2 * paddings[1] + W_f\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution transpose operator. The format of input tensor is NCHW. Where N is batch size, C is the number of input channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution transpose operator. The format of the filter tensor is MCHW, where M is the number of input feature channels, C is the number of output feature channels,H is the height of the filter, and W is the width of the filter. We enforce groups number == 1 in the convolution transpose scenario.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution transpose operator. The format of output tensor is also NCHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1}), the strides(h_stride, w_stride) of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0}), the paddings(h_pad, w_pad) of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "dilations of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "workspace_size_MB",
   "type" : "int",
   "comment" : "workspace size for cudnn, in MB, workspace is a section of GPU memory which will be allocated/freed each time the operator runs, larger workspace size can increase performance but also requires better hardward. This size should be carefully setted.",
   "generated" : 0
 } ] 
},{
 "type" : "gaussian_random",
 "comment" : "\nGaussianRandom Operator.\n\nUsed to initialize tensors with gaussian random generator.\n\n",
 "inputs" : [  ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "Output matrix of gaussian random op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "(vector<int>) The dimension of random tensor.",
   "generated" : 0
 }, { 
   "name" : "mean",
   "type" : "float",
   "comment" : "(float, default 0.0) mean of random tensor.",
   "generated" : 0
 }, { 
   "name" : "std",
   "type" : "float",
   "comment" : "(float, default 1.0) std of random tensor.",
   "generated" : 0
 }, { 
   "name" : "seed",
   "type" : "int",
   "comment" : "(int, default 0) Random seed of generator.0 means use system wide seed.",
   "generated" : 0
 }, { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5(FP32)) Output data type.",
   "generated" : 0
 } ] 
},{
 "type" : "lstm_unit",
 "comment" : "\nLstm Unit Operator\n\nEquation:\n\n$$\ni, f, o, j = split(X) \\\\\nC = C_{prev} * sigm(f + forget\\_bias) + sigm(i) * tanh(j) \\\\\nH = C * sigm(o)\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "FC input before the non-linear activation.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "C_prev",
   "comment" : "The cell state tensor of last time-step in the Lstm Unit operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "C",
   "comment" : "The cell tensor of Lstm Unit operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "H",
   "comment" : "The hidden state tensor of Lstm Unit operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "forget_bias",
   "type" : "float",
   "comment" : "(float, default 0.0) The forget bias of Lstm Unit.",
   "generated" : 0
 } ] 
},{
 "type" : "sign",
 "comment" : "\nSign operator\n\n$$Out = X.sign()$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) Input tensor of sign operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) Output tensor of sign operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "pow",
 "comment" : "\nPow Activation Operator.\n\n$y = x^{factor}$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Pow operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Pow operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "factor",
   "type" : "float",
   "comment" : "The exponential factor of Pow",
   "generated" : 0
 } ] 
},{
 "type" : "clip",
 "comment" : "\nClip Operator.\n\nThe clip operator limits the value of given input within an interval. The interval is\nspecified with arguments 'min' and 'max':\n\n$$\nOut = \\min(\\max(X, min), max)\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor)The input of clip op.The number of dimensions must be between [1, 9].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor)The output of clip op with shape as input(X)",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "min",
   "type" : "float",
   "comment" : "(float)Minimum value, under which element is replaced by min.",
   "generated" : 0
 }, { 
   "name" : "max",
   "type" : "float",
   "comment" : "(float)Maximum value, above which element is replaced by max",
   "generated" : 0
 } ] 
},{
 "type" : "huber_loss",
 "comment" : "\nHuberLoss Operator.\n\nHuber loss is a loss function used in robust regression. We define X as the\ninput value and Y as the target value. Huber loss can evaluate the fitness of\nX to Y. Different from MSE loss, Huber loss is more robust for outliers. The\nshape of X and Y are [batch_size, 1]. The equation is:\n\n$$\nOut_{\\delta}(X, Y)_i =\n\\begin{cases}\n0.5 * (Y_i - X_i)^2,\n\\quad |Y_i - X_i| \\leq \\delta \\\\\n\\delta * (|Y_i - X_i| - 0.5 * \\delta),\n\\quad otherwise\n\\end{cases}\n$$\n\nIn the above equation, $Out_\\delta(X, Y)_i$, $X_i$ and $Y_i$ represent the ith\nelement of Out, X and Y.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input value of huber loss op.X is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The target value of huber loss op.Y is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Residual",
   "comment" : "Intermediate tensor to cache residual value between Y and X.The shape is same as Input(X) and will be reused in backward.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Out",
   "comment" : "The output tensor with shape [batch_size, 1] which represents the huber loss.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "delta",
   "type" : "float",
   "comment" : "Hyper parameter in huber loss.",
   "generated" : 0
 } ] 
},{
 "type" : "smooth_l1_loss",
 "comment" : "\nSmooth L1 Loss Operator.\n\nThis operator computes the smooth l1 loss for X and Y.\nThe operator takes the first dimension of X and Y as batch size.\nFor each instance, it computes the smooth l1 loss element by element first\nand then sums all the losses. So the shape of Out is [batch_size, 1].\n\nThe equation is:\n$$\nOut_{\\sigma}(X, Y)_i = \\begin{cases}\n0.5 * (\\sigma * (X_i - Y_i)) ^ 2\n\\quad |X_i - Y_i| \\lt \\frac{1} {{\\sigma} ^ 2} \\\\\n\\frac{|X_i - Y_i| - 0.5}{{\\sigma}^2},\n\\quad otherwise\n\\end{cases}\n$$\n\nIn the above equation, $Out_{\\sigma}(X, Y)_i$, $X_i$ and $Y_i$ represent the ith\nelement of Out, X and Y.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank at least 2. The input value of smooth l1 loss op with shape [batch_size, dim1, ..., dimN].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank at least 2. The target value of smooth l1 loss op with same shape as X.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "InsideWeight",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank at least 2. This input is optional and should have same shape with X. If provided, the result of (X - Y) will be multiplied by this tensor element by element.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "OutsideWeight",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank at least 2. This input is optional and should have same shape with X. If provided, the out smooth l1 loss will be multiplied by this tensor element by element.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Diff",
   "comment" : "Intermediate variable to cache InsideWeight * (X - Y).",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Out",
   "comment" : "(Tensor, default Tensor<float>) A tensor with rank be 2. The output smooth l1 loss with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "sigma",
   "type" : "float",
   "comment" : "Hyper parameter of smooth l1 loss op.A float scalar with default value 3.0.",
   "generated" : 0
 } ] 
},{
 "type" : "beam_search",
 "comment" : "This is a beam search operator that help to generate sequences.",
 "inputs" : [ 
 { 
   "name" : "pre_ids",
   "comment" : "ids in previous step",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "ids",
   "comment" : "a LoDTensor of shape of [None,k]",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "scores",
   "comment" : "a LoDTensor that has the same shape and LoD with `ids`",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "selected_ids",
   "comment" : "a LoDTensor that stores the IDs selected by beam search",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "selected_scores",
   "comment" : "a LoDTensor that has the same shape and LoD with `selected_ids`",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "level",
   "type" : "int",
   "comment" : "the level of LoDTensor",
   "generated" : 0
 }, { 
   "name" : "beam_size",
   "type" : "int",
   "comment" : "beam size for beam search",
   "generated" : 0
 }, { 
   "name" : "end_id",
   "type" : "int",
   "comment" : "the token id which indicates the end of a sequence",
   "generated" : 0
 } ] 
},{
 "type" : "sum",
 "comment" : "\nSum operator.\n\nThis operators sums the input tensors. All the inputs can carry the \nLoD (Level of Details) information. However, the output only shares \nthe LoD information with the first input.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(vector<Tensor>) The input tensors of sum operator.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of sum operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "concat",
 "comment" : "\nConcat Operator.\n\nConcatenate the input tensors along dimension axis.\nExamples:\n  Input[0] = [[1,2],[3,4]]\n  Input[1] = [[5,6]]\n  axis = 0\n  Output = [[1,2],\n            [3,4],\n            [5,6]]\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input tensors of concat operator.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "Output tensor of concat operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "The axis along which the input tensors will be concatenated.",
   "generated" : 0
 } ] 
},{
 "type" : "softmax_with_cross_entropy",
 "comment" : "\nSoftmax With Cross Entropy Operator.\n\nCross entropy loss with softmax is used as the output layer extensively. This\noperator computes the softmax normalized values for each row of the input\ntensor, after which cross-entropy loss is computed. This provides a more\nnumerically stable gradient.\n\nBecause this operator performs a softmax on logits internally, it expects\nunscaled logits. This operator should not be used with the output of\nsoftmax operator since that would produce incorrect results.\n\nWhen the attribute soft_label is set false, this operators expects mutually\nexclusive hard labels, each sample in a batch is in exactly one class with a\nprobability of 1.0. Each sample in the batch will have a single label.\n\nThe equation is as follows:\n\n1) Hard label (one-hot label, so every sample has exactly one class)\n\n$$Loss_j =  -\\text{Logit}_{Label_j} +\n\\log\\left(\\sum_{i=0}^{K}\\exp(\\text{Logit}_i)\\right),\nj = 1,..., K$$\n\n2) Soft label (each sample can have a distribution over all classes)\n\n$$Loss_j =  -\\sum_{i=0}^{K}\\text{Label}_i \\left(\\text{Logit}_i -\n\\log\\left(\\sum_{i=0}^{K}\\exp(\\text{Logit}_i)\\right)\\right),\nj = 1,...,K$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Logits",
   "comment" : "(Tensor, default: Tensor<float>), The unscaled log probabilities which is a 2-D tensor with shape [N x K]. N is the batch_size, and K is the class number.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor) The ground truth which is a 2-D tensor. If soft_label is set to false, Label is a Tensor<int64> with shape [N x 1]. If soft_label is set to true, Label is a Tensor<float/double> with shape [N x K].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Softmax",
   "comment" : "(Tensor, default: Tensor<float>), A 2-D tensor with shape [N x K]. The outputs value of softmax activation by given the input batch, which will be used in backward calculation.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Loss",
   "comment" : "(Tensor, default: Tensor<float>), A 2-D tensor. The cross entropy loss with shape [N x 1].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "soft_label",
   "type" : "bool",
   "comment" : "(bool, default: false), A flag to indicate whether to interpretate the given labels as soft labels.",
   "generated" : 0
 } ] 
},{
 "type" : "fill_constant_batch_size_like",
 "comment" : "\nFillConstantBatchSizeLike Operator.\n\nFill up a variable with specified constant value.\n\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) Tensor whose dim_idx th dimension is used to specify the batch_size",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) Tensor of specified shape will be filled with the specified value",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5 (FP32)) Output data type",
   "generated" : 0
 }, { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "(vector<int>) The shape of the output",
   "generated" : 0
 }, { 
   "name" : "input_dim_idx",
   "type" : "int",
   "comment" : "(int, default 0) The index of input's batch size dimension",
   "generated" : 0
 }, { 
   "name" : "output_dim_idx",
   "type" : "int",
   "comment" : "(int, default 0) The index of output's batch size dimension",
   "generated" : 0
 }, { 
   "name" : "value",
   "type" : "float",
   "comment" : "(float, default 0) The value to be filled",
   "generated" : 0
 } ] 
},{
 "type" : "adadelta",
 "comment" : "\nAdadelta Optimizer.\n\nAdadelta optimizer is implemented as explained in:\nhttps://arxiv.org/abs/1212.5701\nAdadelta is a per-dimension adaptive learning rate method used\nfor gradient descent.\n\nAdadelta updates are as follows:\n\n$$\navg\\_squared\\_grad\\_out = \\rho * avg\\_squared\\_grad + (1 - \\rho) * grad * grad \\\\\nparam\\_update =  - \\sqrt{\\frac{avg\\_squared\\_update + \\epsilon}{avg\\_squared\\_grad\\_out + \\epsilon}} * grad \\\\\navg\\_squared\\_update\\_out = \\rho * avg\\_squared\\_update + (1 - \\rho) * {param\\_update}^2 \\\\\nparam\\_out = param + param\\_update\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor) Input parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor) Input gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AvgSquaredGrad",
   "comment" : "(Tensor) Input average of squared gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AvgSquaredUpdate",
   "comment" : "(Tensor) Input average of squared parameter updates",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AvgSquaredGradOut",
   "comment" : "(Tensor) Output average of squared gradient",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AvgSquaredUpdateOut",
   "comment" : "(Tensor) Output average of squared parameter updates",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "rho",
   "type" : "float",
   "comment" : "(float, default 0.95) Exponential decay rate for squared gradients.",
   "generated" : 0
 }, { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1.0e-6) Constant for numerical stability",
   "generated" : 0
 } ] 
},{
 "type" : "log",
 "comment" : "\nLog Activation Operator.\n\n$y = \\ln(x)$\n\nNatural logarithm of x.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Log operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Log operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "conv3d_cudnn",
 "comment" : "\nConvolution3D Operator.\n\nThe convolution operation calculates the output based on the input, filter\nand strides, paddings, dilations, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCDHW format, where N is batch\nsize, C is the number of channels,D is the depth of the feature, H is the height of\nthe feature, and W is the width of the feature.\nFilters(Input) is MCDHW format, where M is the number of output image channels,\nC is the number of input image channels, D is the depth of the filter,\nH is the height of the filter, and W is the width of the filter.\nParameters(strides, paddings, dilations) are three elements. These three elements\nrepresent depth, height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       Input shape: $(N, C_{in}, D_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{out}, C_{in}, D_f, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out}= \\frac{(D_{in} + 2 * paddings[0] - (dilations[0] * (D_f - 1) + 1))}{ strides[0]}+ 1 \\\\\n       H_{out}= \\frac{(H_{in} + 2 * paddings[1] - (dilations[1] * (H_f - 1) + 1))}{ strides[1]}+ 1 \\\\\n       W_{out}= \\frac{(W_{in} + 2 * paddings[2] - (dilations[2] * (W_f - 1) + 1))}{ strides[2]}+ 1\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution operator. The format of input tensor is NCDHW. Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution operator. The format of the filter tensor is MCDHW, where M is the number of output image channels, C is the number of input image channels, D is the depth of the filter, H is the height of the filter, and W is the width of the filter.If the groups attribute is greater than 1, C equals the number of input image channels divided by the groups.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution operator.The format of output tensor is also NCDHW.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default:{1, 1, 1}), the strides(d_stride, h_stride, w_stride) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default:{0, 0, 0}), the paddings(d_pad, h_pad, w_pad) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "groups",
   "type" : "int",
   "comment" : "(int default:1), the groups number of the convolution operator. According to grouped convolution in Alex Krizhevsky's Deep CNN paper: when group=2, the first half of the filters is only connected to the first half of the input channels, while the second half of the filters is only connected to the second half of the input channels.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1, 1}), the dilations(d_dilation, h_dilation, w_dilation) of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "workspace_size_MB",
   "type" : "int",
   "comment" : "workspace size for cudnn, in MB, workspace is a section of GPU memory which will be allocated/freed each time the operator runs, larger workspace size can increase performance but also requires better hardware. This size should be chosen carefully.",
   "generated" : 0
 } ] 
},{
 "type" : "conv3d_transpose_cudnn",
 "comment" : "\nConvolution3D Transpose Operator.\n\nThe convolution transpose operation calculates the output based on the input, filter\nand strides, paddings, groups parameters. The size of each dimension of the\nparameters is checked in the infer-shape.\nInput(Input) and output(Output) are in NCDHW format. Where N is batch size, C is the\nnumber of channels, D is the depth of the feature, H is the height of the feature,\nand W is the width of the feature.\nFilter(Input) is in MCDHW format. Where M is the number of input feature channels,\nC is the number of output feature channels, D is the depth of the filter,H is the\nheight of the filter, and W is the width of the filter.\nParameters(strides, paddings) are three elements. These three elements represent\ndepth, height and width, respectively.\nThe input(X) size and output(Out) size may be different.\n\nExample:   \n  Input:\n       Input shape: $(N, C_{in}, D_{in}, H_{in}, W_{in})$\n       Filter shape: $(C_{in}, C_{out}, D_f, H_f, W_f)$\n  Output:\n       Output shape: $(N, C_{out}, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out} = (D_{in} - 1) * strides[0] - 2 * paddings[0] + D_f \\\\\n       H_{out} = (H_{in} - 1) * strides[1] - 2 * paddings[1] + H_f \\\\\n       W_{out} = (W_{in} - 1) * strides[2] - 2 * paddings[2] + W_f\n  $$\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) The input tensor of convolution transpose operator.The format of input tensor is NCDHW. Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor) The filter tensor of convolution transpose operator.The format of the filter tensor is MCDHW, where M is the number of input feature channels, C is the number of output feature channels, D is the depth of the filter, H is the height of the filter, and W is the width of the filter.We enforce groups number == 1 and padding == 0 in the convolution3d transpose scenario.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Output",
   "comment" : "(Tensor) The output tensor of convolution transpose operator.The format of output tensor is also NCDHW.Where N is batch size, C is the number of channels, D is the depth of the feature, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int> default:{1, 1, 1}), the strides{d_stride, h_stride, w_stride} of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int> default:{0, 0, 0}), paddings(d_pad, h_pad, w_pad) of convolution transpose operator.",
   "generated" : 0
 }, { 
   "name" : "dilations",
   "type" : "int array",
   "comment" : "dilations of convolution operator.",
   "generated" : 0
 }, { 
   "name" : "workspace_size_MB",
   "type" : "int",
   "comment" : "workspace size for cudnn, in MB, workspace is a section of GPU memory which will be allocated/freed each time the operator runs, larger workspace size can increase performance but also requires better hardward. This size should be carefully setted.",
   "generated" : 0
 } ] 
},{
 "type" : "cross_entropy",
 "comment" : "\nCrossEntropy Operator.\n\nIt supports both standard cross-entropy and soft-label cross-entropy loss\ncomputation.\n1) One-hot cross-entropy:\n    soft_label = false, Label[i, 0] indicates the class index for sample i:\n\n                $Y[i] = -\\log(X[i, Label[i]])$\n\n2) Soft-label cross-entropy:\n    soft_label = true, Label[i, j] indicates the soft label of class j\n    for sample i:\n\n                $Y[i] = \\sum_j{-Label[i, j] * log(X[i, j])}$\n\n   Please make sure that in this case the summuation of each row of Label\n   equals one.\n\n3) One-hot cross-entropy with vecterized Input(Label):\n     As a special case of 2), when each row of Input(Label) has only one\n     non-zero element (equals 1), soft-label cross-entropy degenerates to a\n     one-hot cross-entropy with one-hot label representation.\n\nBoth the input X and Label can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, where N is the batch size and D is the number of classes. This input is a probability computed by the previous operator, which is almost always the result of a softmax operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor), the ground truth which is a 2-D tensor. When soft_label is set to false, Label is a Tensor<int64> with shape [N x 1]. When soft_label is set to true, Label is a Tensor<float/double> with shape [N x K].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape [N x 1]. The cross entropy loss.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "soft_label",
   "type" : "bool",
   "comment" : "(bool, default false), a flag indicating whether to interpretate the given labels as soft labels.",
   "generated" : 0
 } ] 
},{
 "type" : "matmul",
 "comment" : "\nMatMul Operator.\n\n\nThis operator is used to perform (batched) matrix multiplication\nover the last two dimensions of the input tensors `X` and `Y`.\n\nIf a transpose flag is specified, the last two dimensions of the\ntensor are transposed. If the tensor is rank-1 of shape [D], then\nfor `X` it is treated as [1, D] in nontransposed form and as [D, 1]\nin transposed form, whereas for `Y` it is the opposite: It is treated\nas [D, 1] in nontransposed form and as [1, D] in transposed form.\n\nExamples without transpose:\n- X: [K], Y: [K] => Out: [1]\n- X: [K], Y: [K, N] => Out: [N]\n- X: [B, M, K], Y: [K] => Out: [B, M]\n- X: [M, K], Y: [B, K, N] => Out: [B, M, N]\n- X: [B, M, K], Y: [B, K, N] => Out: [B, M, N]\n\nThe behavior is designed to be similar to the `numpy.matmul` function.\nThe differences are:\n- Currently only rank 1 to rank 3 input tensors are supported.\n- We add `transpose_X` and `transpose_Y` flags.\n\nBoth the input `X` and `Y` can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input `X`.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The first input of MatMul op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The second input of MatMul op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of MatMul op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "transpose_X",
   "type" : "bool",
   "comment" : "If true, use the transpose of `X`.\n        ",
   "generated" : 0
 }, { 
   "name" : "transpose_Y",
   "type" : "bool",
   "comment" : "If true, use the transpose of `Y`.\n        ",
   "generated" : 0
 } ] 
},{
 "type" : "brelu",
 "comment" : "\nBRelu Activation Operator.\n\n$y = \\max(\\min(x, t_{min}), t_{max})$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of BRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of BRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "t_min",
   "type" : "float",
   "comment" : "The min marginal value of BRelu",
   "generated" : 0
 }, { 
   "name" : "t_max",
   "type" : "float",
   "comment" : "The max marginal value of BRelu",
   "generated" : 0
 } ] 
},{
 "type" : "crf_decoding",
 "comment" : "\nThe crf_decoding operator reads the emission feature weights and the transition\nfeature weights learned by the linear_chain_crf operator. It implements the\nViterbi algorithm which is a dynamic programming algorithm for finding the most\nlikely sequence of hidden states, called the Viterbi path, that results in a\nsequence of observed tags.\n\nThe output of this operator changes according to whether Input(Label) is given:\n\n1. Input(Label) is given:\n\nThis happens in training. This operator is used to co-work with the chunk_eval\noperator.\n\nWhen Input(Label) is given, the crf_decoding operator returns a row vector\nwith shape [N x 1] whose values are fixed to be 0, indicating an incorrect\nprediction, or 1 indicating a tag is correctly predicted. Such an output is the\ninput to chunk_eval operator.\n\n2. Input(Label) is not given:\n\nThis is the standard decoding process.\n\nThe crf_decoding operator returns a row vector with shape [N x 1] whose values\nrange from 0 to maximum tag number - 1. Each element indicates an index of a\npredicted tag.\n",
 "inputs" : [ 
 { 
   "name" : "Emission",
   "comment" : "(LoDTensor, default: LoDTensor<float>). A LoDTensor with shape [N x D] where N is the size of the mini-batch and D is the total tag number. This input is the unscaled emission weight matrix of the linear_chain_crf operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Transition",
   "comment" : "(Tensor, default: Tensor<float>). A Tensor with shape [(D + 2) x D]. This input is the transition weights learned by the linear_chain_crf operator, denoted as w. The 1st row of w are transition weights for the start mask. The 2nd row of w are transition weights for the end mask. Transition weights between other tags begin from the 3rd row of w. See more details in comments of the linear_chain_crf operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(LoDTensor,  LoDTensor<int64_t>). The ground truth with shape [N x 1]. This input is optional. See more details in the operator's comments.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ViterbiPath",
   "comment" : "(LoDTensor, LoDTensor<int64_t>). The decoding results. What to return changes depending on whether the Input(Label) (the ground truth) is given. See more details in the operator's comment.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "clip_by_norm",
 "comment" : "\nClipByNorm Operator.\n\nThis operator limits the L2 norm of the input $X$ within $max\\_norm$.\nIf the L2 norm of $X$ is less than or equal to $max\\_norm$, $Out$ will be\nthe same as $X$. If the L2 norm of $X$ is greater than $max\\_norm$, $X$ will\nbe linearly scaled to make the L2 norm of $Out$ equal to $max\\_norm$, as\nshown in the following formula:\n\n$$\nOut = \\frac{max\\_norm * X}{norm(X)},\n$$\n\nwhere $norm(X)$ represents the L2 norm of $X$.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input of clip_by_norm op.The number of dimensions must be between [1, 9].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output of clip_by_norm op with shape as input(X)",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "max_norm",
   "type" : "float",
   "comment" : "(float) The maximum norm value.",
   "generated" : 0
 } ] 
},{
 "type" : "gather",
 "comment" : "\nGather Operator.\n\n$Out = X[Index]$\n\nOut is obtained by gathering entries of the outer-most dimension \nof X indexed by Index and concatenate them together.\n\nExample:\n\nX = [[1, 2],\n     [3, 4],\n     [5, 6]]\n\nIndex = [[1, 2]]\n\nThen:\n\nOut = [[3, 4],\n       [5, 6]]\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The source input of gather op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Index",
   "comment" : "The index input of gather op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of gather op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "pool3d_cudnn",
 "comment" : "\nPool3d Operator.\n\nThe pooling3d operation calculates the output based on\nthe input, pooling_type, ksize, strides, and paddings parameters.\nInput(X) and output(Out) are in NCDHW format, where N is batch\nsize, C is the number of channels, and D, H and W are the depth, height and\nwidth of the feature, respectively. Parameters(ksize, strides, paddings) \nare three elements. These three elements represent depth, height and \nwidth, respectively. The input(X) size and output(Out) size may be different.\n\nExample:\n  Input:\n       X shape: $(N, C, D_{in}, H_{in}, W_{in})$\n  Output:\n       Out shape: $(N, C, D_{out}, H_{out}, W_{out})$\n  Where\n  $$\n       D_{out} = \\frac{(D_{in} - ksize[0] + 2 * paddings[0])}{strides[0]} + 1 \\\\\n       H_{out} = \\frac{(H_{in} - ksize[1] + 2 * paddings[1])}{strides[1]} + 1 \\\\\n       W_{out} = \\frac{(W_{in} - ksize[2] + 2 * paddings[2])}{strides[2]} + 1\n  $$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of pooling operator. The format of input tensor is NCDHW, where N is batch size, C is the number of channels, and D, H and W is the depth, height and width of the feature, respectively.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of pooling operator.The format of output tensor is also NCDHW, where N is batch size, C is the number of channels, and D, H and W is the depth, height and width of the feature, respectively.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "pooling_type",
   "type" : "string",
   "comment" : "(string) Pooling type, can be \"max\" for max-pooling and \"avg\" for average-pooling.",
   "generated" : 0
 }, { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector<int>) The pooling window size(depth, height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 }, { 
   "name" : "global_pooling",
   "type" : "bool",
   "comment" : "(bool, default false) Whether to use the global pooling. If global_pooling = true, ksize and paddings wille be ignored.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector<int>, default {1,1,1}) Strides(depth, height, width) of the pooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector<int>, default {0,0,0}), paddings(depth, height, width) of pooling operator. If global_pooling = true, ksize and paddings will be ignored.",
   "generated" : 0
 } ] 
},{
 "type" : "crop",
 "comment" : "\nCrop Operator.\n\nCrop input into output, as specified by offsets and shape.\n\nThere are two ways to set shape:\n1. reference input: crop input X into the same shape as reference input.\n                    The dimension of reference input should\n                    be the same as the dimension of input X.\n2. shape list: crop input X into the shape described by a list<int>.\n               The size of shape list should be the same as\n               the dimension size of input X.\n\nThe input should be a k-D tensor(k > 0 and k < 7). As an example:\n\nGiven:\n\n    X = [[0, 1, 2, 0, 0]\n         [0, 3, 4, 0, 0]\n         [0, 0, 0, 0, 0]],\n\nand\n\n    offsets = [0, 1],\n\nand\n\n    shape = [2, 2],\n\nwe get:\n\n    Out = [[1, 2],\n           [3, 4]].\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of pad op. The input should be a k-D tensor(k > 0 and k < 7).",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The input used as reference for cropping, which is of the same dimensions as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of crop op, which is of the same dimensions as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "offsets",
   "type" : "int array",
   "comment" : "A list<int> describing offsets to be cropped. The size of offsets list should be the same as the dimension size of input X.",
   "generated" : 0
 }, { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "A list<int> describing the shape of output. The size of shape list should be the same as the dimension size of input X.",
   "generated" : 0
 } ] 
},{
 "type" : "merge_lod_tensor",
 "comment" : "\n        Merge True and False branches of LoDTensor into a single Output,\n        with a mask at certain lod level. X is used to obtain complete\n        lod information. Please refer to SplitLoDTensorOp.",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input LoDTensor, contains complete lod information to construct the output",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mask",
   "comment" : "A bool column vector which mask the input",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "InTrue",
   "comment" : "The True branch to be merged",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "InFalse",
   "comment" : "The False branch to be merged",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The merged output LoDTensor",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "level",
   "type" : "int",
   "comment" : "(int) the specific lod level to rank.",
   "generated" : 0
 } ] 
},{
 "type" : "elementwise_mul",
 "comment" : "\nLimited Elementwise Mul Operator.\n\nThe equation is:\n\n$Out = X \\odot\\ Y$\n\nX is a tensor of any dimension and the dimensions of tensor Y must be smaller than\nor equal to the dimensions of X. \n\nThere are two cases for this operator:\n1. The shape of Y is same with X;\n2. The shape of Y is a subset of X.\n\nFor case 2:\nY will be broadcasted to match the shape of X and axis should be \nthe starting dimension index for broadcasting Y onto X.\n\nexample:\n  shape(X) = (2, 3, 4, 5), shape(Y) = (,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (5,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1\n  shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0\n\nBoth the input X and Y can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The first input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor) The second input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default -1) The starting dimension index for broadcasting Y onto X",
   "generated" : 0
 } ] 
},{
 "type" : "rmsprop",
 "comment" : "\nRmsprop Optimizer. \n\n$$\nMeanSquareOut = decay * MeanSquare + (1 - decay) * Grad * Grad \\\\\nMomentOut = momentum * Moment +\n            \\frac{LearningRate * Grad}{\\sqrt{MeanSquareOut + epsilon}} \\\\\nParamOut = Param -  MomentOut\n$$\n\nThe original slides that proposed Rmsprop: Slide 29 of\nhttp://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf)\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor, default Tensor<float>) Input parameter value that has to be updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MeanSquare",
   "comment" : "(Tensor, default Tensor<float>) The mean square value that gets updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor, default Tensor<float>) The learning rate should be a tensor of size 1.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor, default Tensor<float>) Input gradient of the parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment",
   "comment" : "(Tensor, default Tensor<float>) The moment that gets updated.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output updated parameter value.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MomentOut",
   "comment" : "(Tensor) Output updated moment.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MeanSquareOut",
   "comment" : "(Tensor) Output Mean squared updated value.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "(float, default 1e-10) Constant for numerical stability.",
   "generated" : 0
 }, { 
   "name" : "decay",
   "type" : "float",
   "comment" : "(float, default 0.9) Discounting factor for coming gradient.",
   "generated" : 0
 }, { 
   "name" : "momentum",
   "type" : "float",
   "comment" : "(float, default 0.0) Constant value.",
   "generated" : 0
 } ] 
},{
 "type" : "proximal_gd",
 "comment" : "\nProximalGD Operator.\n\nOptimizer that implements the proximal gradient descent algorithm:\n\n$$\nprox\\_param = param - learning\\_rate * grad \\\\\nparam = sign(prox\\_param) / (1 + learning\\_rate * l2) *\n        \\max(|prox\\_param| - learning\\_rate * l1, 0)\n$$        \n\nThe paper that proposed Proximal Gradient Descent:\n(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor, default Tensor<float>) Input parameter value that has to be updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor, default Tensor<float>) Input gradient of the parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor, default Tensor<float>) The learning rate should be a tensor of size 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output updated parameter value.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "l1",
   "type" : "float",
   "comment" : "(float, default 0.0) L1 regularization strength.",
   "generated" : 0
 }, { 
   "name" : "l2",
   "type" : "float",
   "comment" : "(float, default 0.0) L2 regularization strength.",
   "generated" : 0
 } ] 
},{
 "type" : "positive_negative_pair",
 "comment" : "\n        PositiveNegativePairOp can be used to evaluate Learning To Rank(LTR) \n        model performance. \n        Within some context, e.g. the \"query\", a LTR model generates scores\n        for a list of items, which gives a partial order of the items.\n        PositiveNegativePairOp takes a list of reference rank order \n        (Input(\"Label\")) and the model generated scores (Input(Score)) as \n        inputs and counts the pairs that ranked correctly and incorrectly.\n",
 "inputs" : [ 
 { 
   "name" : "Score",
   "comment" : "(Tensor, float) Model Score on an item (with respect to QueryID). It's a 2-D tensor with shape [batch_size, depth], where the column specified by the attribute \"column\" is used as item score.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor, float) Label of an item (with repsect to QueryId). It's a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "QueryID",
   "comment" : "(Tensor, int64) Query ID that indicates the context. Its shape should be the same as Label.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AccumulatePositivePair",
   "comment" : "(float) Optional. The accumulated number of positive pairs over a stream of data. If provided, the output PositivePair will be initialized with this number rather than 0. it won't be modified in place.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AccumulateNegativePair",
   "comment" : "(float) Optional. The accumulated number of negative pairs over a stream of data. If provided, the output NegativePair will be initialized with this number rather than 0. it won't be modified in place.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AccumulateNeutralPair",
   "comment" : "(float) Optional. The accumulated number of neutral pairs over a stream of data. If provided, the output NeutralPair will be initialized with this number rather than 0. it won't be modified in place.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "(float) Optional. Weight of current item. If specified, its shape should be the same as Label, and the meaning of the output changes from numbers of pairs to the total sum of pairs' weights. Weight of a pair of items is the average of their weights.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "PositivePair",
   "comment" : "(float) Number of positive pairs, i.e. the pairs of items that are ranked correctly.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "NegativePair",
   "comment" : "(float) Number of negative pairs, i.e. the pairs of items that are ranked incorrectly.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "NeutralPair",
   "comment" : "(float) Number of neutral pairs, i.e. the pairs of items that have the same score.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "column",
   "type" : "int",
   "comment" : "(int, default -1) The column position of Score used to rank items in descending order. It must be in the range of [-rank(Score), rank(Score)). If `dim < 0`, the dim to reduce is `rank + dim`. Noting that reducing on the first dim will make the LoD info lost.",
   "generated" : 0
 } ] 
},{
 "type" : "log_loss",
 "comment" : "\nLogLoss Operator.\n\nLog loss is a loss function used for binary classification. Log Loss quantifies\nthe accuracy of a classifier by penalising false classifications. Minimising the\nLog Loss is equivalent to maximising the accuracy of the classifier. We define\nPredicted as the values predicted by our model and Labels as the target ground\ntruth value. Log loss can evaluate how close the predicted values are to the\ntarget. The shapes of Predicted and Labels are both [batch_size, 1].\nThe equation is:\n\n$$\nLoss = - Labels * log(Predicted + \\epsilon) -\n        (1 - Labels) * log(1 - Predicted + \\epsilon)\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Predicted",
   "comment" : "The input value (Predicted) of Log loss op.Predicted is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Labels",
   "comment" : "The target value (Labels) of Log loss op.Labels is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Loss",
   "comment" : "The output tensor with shape [batch_size, 1] which represents the log loss.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "Epsilon in log loss.",
   "generated" : 0
 } ] 
},{
 "type" : "mean",
 "comment" : "\nMean Operator.\n\nOut is a scalar which is the mean of all elements in X. \n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of mean op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of mean op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "elementwise_add",
 "comment" : "\nLimited Elementwise Add Operator.\n\nThe equation is:\n\n$Out = X + Y$\n\nX is a tensor of any dimension and the dimensions of tensor Y must be smaller than\nor equal to the dimensions of X. \n\nThere are two cases for this operator:\n1. The shape of Y is same with X;\n2. The shape of Y is a subset of X.\n\nFor case 2:\nY will be broadcasted to match the shape of X and axis should be \nthe starting dimension index for broadcasting Y onto X.\n\nexample:\n  shape(X) = (2, 3, 4, 5), shape(Y) = (,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (5,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1\n  shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0\n\nBoth the input X and Y can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The first input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor) The second input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default -1) The starting dimension index for broadcasting Y onto X",
   "generated" : 0
 } ] 
},{
 "type" : "fill_zeros_like",
 "comment" : "\nFillZerosLike Operator.\n\nFill up a variable with zeros.\nThe output will have the same size as the input.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of fill-zeros-like op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "The variable will be filled up with zeros.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "prelu",
 "comment" : "\nPRelu Operator.\n\nThe equation is:\n\n$$\nf(x) =\n\\begin{cases}\n\\alpha * x, \\quad  \\text{if} \\ x < 0 \\\\\nx,         \\qquad  \\text{if} \\ x >= 0\n\\end{cases}\n$$\n\nThe input `X` can carry the LoD (Level of Details) information,\nor not. And the output shares the LoD information with input `X`.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor of prelu operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Alpha",
   "comment" : "The alpha weight of prelu operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output tensor of prelu operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "fill",
 "comment" : "Fill operator\n\nFill an tensor with `value` and `shape`. The type of the tensor is specify by\n`dtype`.\n",
 "inputs" : [  ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) The output tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "value",
   "type" : "float array",
   "comment" : "The float values of tensor, which are flatten in row major",
   "generated" : 0
 }, { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "The shape of output tensor",
   "generated" : 0
 }, { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "The data type of output tensor, Default is float",
   "generated" : 0
 }, { 
   "name" : "force_cpu",
   "type" : "bool",
   "comment" : "Whether the output tensor must be at CPU memory or not. Default is false.",
   "generated" : 0
 } ] 
},{
 "type" : "sigmoid_cross_entropy_with_logits",
 "comment" : "\nSigmoidCrossEntropyWithLogits Operator.\n\nThis measures the element-wise probability error in classification tasks\nin which each class is independent. This can be thought of as predicting labels\nfor a data-point, where labels are not mutually exclusive.\nFor example, a news article can be about politics, technology or sports\nat the same time or none of these.\n\nThe logistic loss is given as follows:\n\n       $$loss = -Labels * \\log(\\sigma(X)) - (1 - Labels) * \\log(1 - \\sigma(X))$$\n\nWe know that $$\\sigma(X) = (1 / (1 + \\exp(-X)))$$. By substituting this we get:\n\n       $$loss = X - X * Labels + \\log(1 + \\exp(-X))$$\n\nFor stability and to prevent overflow of $$\\exp(-X)$$ when X < 0,\nwe reformulate the loss as follows:\n\n       $$loss = \\max(X, 0) - X * Labels + \\log(1 + \\exp(-|X|))$$\n\nBoth the input `X` and `Labels` can carry the LoD (Level of Details) information.\nHowever the output only shares the LoD with input `X`.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape N x D, where N is the batch size and D is the number of classes. This input is a tensor of logits computed by the previous  operator. Logits are unscaled log probabilities given as log(p/(1-p)).",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor of the same type and shape as X. This input is a tensor of probabalistic labels for each logit",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape N x D  of elementwise logistic losses.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "modified_huber_loss",
 "comment" : "\nModified Huber Loss Operator.\n\nThis operator is used in binary classification problem. The shape of\ninput X and target Y are both [N, 1] and so is the shape of the output loss.\nSince target Y is not differentiable, calculating gradient for Y is illegal.\nThe formula of modified huber loss is:\n\n$$\nL(y, f(x)) = \n\\begin{cases}\n(\\max(0, 1 - yf(x)))^2,  \\text{if} \\  yf(x) >= -1    \\\\\n             -4yf(x),    \\quad \\text{otherwise}\n\\end{cases}\n$$\n\nMake sure the values of target label Y are in {0, 1} here. This operator will\nscale values of Y to {-1, +1} when computing losses and gradients.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor of modified huber loss op. X is 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The target labels of modified huber loss op. The shape of Y is the same as X. Values of Y must be 0 or 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "IntermediateVal",
   "comment" : "Variable to save intermediate result which will be reused in backward processing.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Out",
   "comment" : "Classification loss for X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "elementwise_sub",
 "comment" : "\nLimited Elementwise Sub Operator.\n\nThe equation is:\n\n$Out = X - Y$\n\nX is a tensor of any dimension and the dimensions of tensor Y must be smaller than\nor equal to the dimensions of X. \n\nThere are two cases for this operator:\n1. The shape of Y is same with X;\n2. The shape of Y is a subset of X.\n\nFor case 2:\nY will be broadcasted to match the shape of X and axis should be \nthe starting dimension index for broadcasting Y onto X.\n\nexample:\n  shape(X) = (2, 3, 4, 5), shape(Y) = (,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (5,)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (4, 5)\n  shape(X) = (2, 3, 4, 5), shape(Y) = (3, 4), with axis=1\n  shape(X) = (2, 3, 4, 5), shape(Y) = (2), with axis=0\n\nBoth the input X and Y can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The first input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor) The second input tensor of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of elementwise op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default -1) The starting dimension index for broadcasting Y onto X",
   "generated" : 0
 } ] 
},{
 "type" : "reduce_mean",
 "comment" : "\n{ReduceOp} Operator.\n\nThis operator computes the mean of input tensor along the given dimension. \nThe result tensor has 1 fewer dimension than the input unless keep_dim is true.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor. Tensors with rank at most 6 are supported.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The result tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dim",
   "type" : "int",
   "comment" : "(int, default 0) The dimension to reduce. Must be in the range [-rank(input), rank(input)). If `dim < 0`, the dim to reduce is `rank + dim`. Note that reducing on the first dim will make the LoD info lost.",
   "generated" : 0
 }, { 
   "name" : "keep_dim",
   "type" : "bool",
   "comment" : "(bool, default false) If true, retain the reduced dimension with length 1.",
   "generated" : 0
 } ] 
},{
 "type" : "square",
 "comment" : "\nSquare Activation Operator.\n\n$y = x^2$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Square operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Square operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "reduce_max",
 "comment" : "\n{ReduceOp} Operator.\n\nThis operator computes the max of input tensor along the given dimension. \nThe result tensor has 1 fewer dimension than the input unless keep_dim is true.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor. Tensors with rank at most 6 are supported.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The result tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dim",
   "type" : "int",
   "comment" : "(int, default 0) The dimension to reduce. Must be in the range [-rank(input), rank(input)). If `dim < 0`, the dim to reduce is `rank + dim`. Note that reducing on the first dim will make the LoD info lost.",
   "generated" : 0
 }, { 
   "name" : "keep_dim",
   "type" : "bool",
   "comment" : "(bool, default false) If true, retain the reduced dimension with length 1.",
   "generated" : 0
 } ] 
},{
 "type" : "logical_or",
 "comment" : "logical_or Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = X || Y$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) Left hand operand of logical_or operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) Right hand operand of logical_or operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = X || Y$$",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "less_than",
 "comment" : "less_than Operator\n\nIt operates element-wise on X and Y, and returns the Out. Each of them is a\nN-dim tensor. X and Y could be any type.  The each element of the Out tensor is\ncalculated by Out = X < Y\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the left hand operand of less_than operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) the right hand operand of less_than operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is Out = X < Y",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "gru_unit",
 "comment" : "\nGRUUnit Operator implements partial calculations of the GRU unit as following:\n\n$$\nupdate \\ gate: u_t = actGate(xu_t + W_u * h_{t-1} + b_u) \\\\\nreset \\ gate: r_t = actGate(xr_t + W_r * h_{t-1} + b_r)  \\\\\noutput \\ candidate: {h}_t = actNode(xc_t + W_c * dot(r_t, h_{t-1}) + b_c) \\\\\noutput: h_t = dot((1 - u_t), h_{t-1}) + dot(u_t, {h}_t)\n$$\n\nwhich is same as one time step of GRU Operator.\n\n@note To implement the complete GRU unit, fully-connected operator must be \nused before to feed xu, xr and xc as the Input of GRUUnit operator.\n\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) Matrix with shape [batch_size, frame_size * 3] for the input.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "HiddenPrev",
   "comment" : "(Tensor) Matrix with shape [batch_size, frame_size] for the states of previous time step.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "(Tensor) Weight matrix with shape [frame_size, frame_size * 3]. The elements continuous in memory can be divided into two parts. The first part are weights of the update gate and reset gate with shape [frame_size, frame_size * 2], and the second part are weights of output candidate with shape [frame_size, frame_size].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "(Tensor) Bias vector with shape [1, frame_size * 3] concatenating bias of the update gate, reset gate and output candidate.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Gate",
   "comment" : "(Tensor) Matrix with shape [batch_size, frame_size * 3] for the output of update gate, reset gate and output candidate.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "ResetHiddenPrev",
   "comment" : "(Tensor) Matrix with shape [batch_size, frame_size] for the reseted hidden state of previous time step.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Hidden",
   "comment" : "(Tensor) The GRU hidden state of the current time step with shape [batch_size, frame_size].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "activation",
   "type" : "int",
   "comment" : "(enum int, default tanh) The activation type used for output candidate {h}_t.",
   "generated" : 0
 }, { 
   "name" : "gate_activation",
   "type" : "int",
   "comment" : "(enum int, default sigmoid) The activation type used in update gate and reset gate.",
   "generated" : 0
 } ] 
},{
 "type" : "swish",
 "comment" : "\nSwish Activation Operator.\n\n$$y = \\frac{x}{1 + e^{- \\beta x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Swish operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Swish operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "beta",
   "type" : "float",
   "comment" : "Constant beta of swish operator",
   "generated" : 0
 } ] 
},{
 "type" : "is_empty",
 "comment" : "\nIsEmpty Operator which checks whether a tensor is empty.\n\nIt will just return product(tensor.ddims()) > 0;\n              ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) Tensor which is to be checked.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) a boolean Tensor that indicate empty or not.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "sequence_concat",
 "comment" : "\nThe sequence_concat operator concatenates multiple LoDTensors. \nIt only supports sequence (LoD Tensor with level number is 1) \nor a nested sequence (LoD tensor with level number is 2) as its input.\n- Case1:\n  If the axis is other than 0(here, axis is 1 and level is 1),\n  each input should have the same LoD information and the LoD \n  information of the output keeps the same as the input.\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,2,4}, {0,1,2,3,4}}; Dims(x1) = (4,4,4)\n  LoD(Out) = {{0,2,4}, {0,1,2,3,4}}; Dims(Out) = (4,7,4)\n\n- Case2:\n  If the axis is 0(here, leve is 0), the inputs are concatenated along \n  time steps, the LoD information of the output need to re-compute.\n  The LoD information of level-1 should be same.\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,2,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,2,4}, {0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\n- Case3:\n  If the axis is 0(here, level is 1).\n\n  LoD(x0) = {{0,2,4}, {0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,3,4}, {0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,5,8}, {0,1,2,3,5,7,8,9,11}}; Dims(Out) = (11,3,4)\n\n- Case4:\n  If the LoD number is 1, axis is 0, level is 0\n\n  LoD(x0) = {{0,1,2,3,4}}; Dims(x0) = (4,3,4)\n  LoD(x1) = {{0,1,3,5,7}}; Dims(x1) = (7,3,4)\n  LoD(Out) = {{0,2,5,8,11}}; Dims(Out) = (11,3,4)\n\nNOTE: The levels of all the inputs should be the same.\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LodTensorArray) Input is a vector of LoDTensor, each of which is a variable-length sequence or nested sequence.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor), Variable-length output of sequence_concat Op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default 0) The axis along which the inputs will be joined. If axis is 0, the inputs will be joined with LoD index.",
   "generated" : 0
 }, { 
   "name" : "level",
   "type" : "int",
   "comment" : "(int, default 0) The level at which the inputs will be joined. If the level is 0, the inputs will be joined at the nested sequence level. If the level is 1, the inputs will be joined at the sequence level. The level should be less than the level number of inputs.",
   "generated" : 0
 } ] 
},{
 "type" : "floor",
 "comment" : "\nFloor Activation Operator.\n\n$y = floor(x)$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Floor operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Floor operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "cast",
 "comment" : "\nCast Operator.\n\nThis Operator casts the input tensor to another data type and\nreturns tha Output Tensor.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor of cast op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output tensor of cast op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "out_dtype",
   "type" : "int",
   "comment" : "output data type",
   "generated" : 0
 }, { 
   "name" : "in_dtype",
   "type" : "int",
   "comment" : "input data type",
   "generated" : 0
 } ] 
},{
 "type" : "ceil",
 "comment" : "\nCeil Activation Operator.\n\n$y = ceil(x)$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Ceil operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Ceil operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "tanh",
 "comment" : "\nTanh Activation Operator.\n\n$$y = \\frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Tanh operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Tanh operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "feed",
 "comment" : "\nFeed Operator.\n\nIt should not be configured by users directly.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of feed op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of feed op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "col",
   "type" : "int",
   "comment" : "(int) The column of feed",
   "generated" : 0
 } ] 
},{
 "type" : "rnn_memory_helper",
 "comment" : "",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5 (FP32)) Output data type",
   "generated" : 0
 } ] 
},{
 "type" : "unpool",
 "comment" : "\n        \"Input shape: $(N, C_{in}, H_{in}, W_{in})$\n        Output shape: $(N, C_{out}, H_{out}, W_{out})$\n        Where\n          $$\n            H_{out} = (H_{in}−1) * strides[0] − 2 * paddings[0] + ksize[0] \\\\\n            W_{out} = (W_{in}−1) * strides[1] − 2 * paddings[1] + ksize[1]\n          $$\n        Paper: http://www.matthewzeiler.com/wp-content/uploads/2017\n        /07/iccv2011.pdf\n        ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of unpool operator. The format of input tensor is NCHW. Where N is batch size, C is the number of channels, H and W is the height and width of feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Indices",
   "comment" : "(Tensor) The input tensor of the indices given out by MaxPool2d. The format of input tensor is NCHW. Where N is batch size, C is the number of channels, H and W is the height and width of feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of unpool operator.The format of output tensor is also NCHW.Where N is batch size, C is the number of channels, H and W is the height and width of feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "ksize",
   "type" : "int array",
   "comment" : "(vector), the unpooling window size(height, width) of unpooling operator.",
   "generated" : 0
 }, { 
   "name" : "strides",
   "type" : "int array",
   "comment" : "(vector, default:{1, 1}), strides (height, width) of unpooling operator.",
   "generated" : 0
 }, { 
   "name" : "paddings",
   "type" : "int array",
   "comment" : "(vector defalut:{0,0}), paddings (height, width) of unpooling operator.",
   "generated" : 0
 }, { 
   "name" : "unpooling_type",
   "type" : "string",
   "comment" : "(string), unpooling type, can be \"max\" for max-unpooling ",
   "generated" : 0
 } ] 
},{
 "type" : "transpose",
 "comment" : "\nTranspose Operator.\n\nThe input tensor will be permuted according to the axis values given.\nThe op functions similar to how numpy.transpose works in python.\nFor example:\n >> input = numpy.arange(6).reshape((2,3))\n >> input\n array([[0, 1, 2],\n        [3, 4, 5]])\n >> axis = [1, 0]\n >> output = input.transpose(axis)\n >> output\n array([[0, 3],\n        [1, 4],\n\t\t[2, 5]])\nSo, given a input tensor of shape(N, C, H, W) and the axis is {0, 2, 3, 1},\nthe output tensor shape will be (N, H, W, C)\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor)The input tensor, tensors with rank at most 6 are supported",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor)The output tensor",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "axis",
   "type" : "int array",
   "comment" : "(vector<int>)A list of values, and the size of the list should be the same with the input tensor rank, the tensor will permute the axes according the the values given",
   "generated" : 0
 } ] 
},{
 "type" : "rnn_memory_helper_grad",
 "comment" : "",
 "inputs" : [ 
 { 
   "name" : "Out@GRAD",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "X",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Out",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "X@GRAD",
   "comment" : "",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5 (FP32)) Output data type",
   "generated" : 0
 } ] 
},{
 "type" : "momentum",
 "comment" : "\nMomentum Optimizer.\n\nThis optimizer has a flag for Nestrov Momentum.\nThe update equations are as follows:\n\n$$\nvelocity = mu * velocity + gradient \\\\\nif (use\\_nesterov):   \\\\\n  param = param - gradient * learning\\_rate + mu * velocity * learning\\_rate \\\\\nelse:   \\\\\n  param = param - learning\\_rate * velocity. \\\\\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor, default Tensor<float>) Input parameter that has to be updated",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor, default Tensor<float>) Input gradient of the parameter",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Velocity",
   "comment" : "(Tensor, default Tensor<float>) Input velocity (corresponding to the parameter) that has to be updated",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor, default Tensor<float>) Input learning rate",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) This output is updated parameter. It shared memory with Input(Param).",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "VelocityOut",
   "comment" : "(Tensor) This output is updated velocity. It shared memory with Input(Velocity).",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "mu",
   "type" : "float",
   "comment" : "(float) Momentum coefficient",
   "generated" : 0
 }, { 
   "name" : "use_nesterov",
   "type" : "bool",
   "comment" : "(bool, default false) Use Nesterov Momentum",
   "generated" : 0
 } ] 
},{
 "type" : "scatter",
 "comment" : "\nScatter Operator.\n\nThis operator obtains output by updating the input on selected indices on the first axis:\n\n$$\nOut = Ref \\\\\nOut[Index] = Ref[Index] + Updates\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "Ref",
   "comment" : "The source input of scatter op",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Index",
   "comment" : "The index input of scatter op where Ref will be updated",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Updates",
   "comment" : "The updated value of updates op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of add op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "less_equal",
 "comment" : "less_equal Operator\n\nIt operates element-wise on X and Y, and returns the Out. Each of them is a\nN-dim tensor. X and Y could be any type.  The each element of the Out tensor is\ncalculated by Out = X <= Y\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the left hand operand of less_equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) the right hand operand of less_equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is Out = X <= Y",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "rank_loss",
 "comment" : "\nRankLoss Operator.\n\nRankLoss operator for RankNet\n(http://icml.cc/2015/wp-content/uploads/2015/06/icml_ranking.pdf). \nRankNet is a pairwise ranking model with\none training sample consisting of a pair of doc A and B, and the label P\nindicating that A is ranked higher than B or not:\n\nP = {0, 1} or {0, 0.5, 1}, where 0.5 means no information about the rank of\nthe input pair.\n\nThe RankLoss operator takes three inputs: Left (o_i), Right (o_j) and Label\n(P_{i,j}), which represent the output score of RankNet for the two docs and \nthe label respectively, and yields the rank loss C_{i,j} using the following \nequation:\n\n$$\n  C_{i,j} = -\\tilde{P_{ij}} * o_{i,j} + \\log(1 + e^{o_{i,j}}) \\\\\n  o_{i,j} =  o_i - o_j  \\\\\n  \\tilde{P_{i,j}} = \\left \\{0, 0.5, 1 \\right \\} \\ or \\ \\left \\{0, 1 \\right \\}\n$$\n\nThe operator can take batch inputs with size batch_size (batch_size >= 1).\n\n",
 "inputs" : [ 
 { 
   "name" : "Label",
   "comment" : "(2-D Tensor with shape [batch_size x 1]) The label indicating A ranked higher than B or not.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Left",
   "comment" : "(2-D Tensor with shape [batch_size x 1]) The output of RankNet for doc A.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Right",
   "comment" : "(2-D Tensor with shape [batch_size x 1]) The output of RankNet for doc B.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(2-D Tensor with shape [batch_size x 1]) The output loss of RankLoss operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "greater_than",
 "comment" : "greater_than Operator\n\nIt operates element-wise on X and Y, and returns the Out. Each of them is a\nN-dim tensor. X and Y could be any type.  The each element of the Out tensor is\ncalculated by Out = X > Y\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the left hand operand of greater_than operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) the right hand operand of greater_than operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is Out = X > Y",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "equal",
 "comment" : "equal Operator\n\nIt operates element-wise on X and Y, and returns the Out. Each of them is a\nN-dim tensor. X and Y could be any type.  The each element of the Out tensor is\ncalculated by Out = X == Y\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) the left hand operand of equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) the right hand operand of equal operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is Out = X == Y",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "uniform_random",
 "comment" : "\nUniform random operator.\n\nThis operator initializes a tensor with random values sampled from a \nuniform distribution.\n\n",
 "inputs" : [  ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of uniform random op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "shape",
   "type" : "int array",
   "comment" : "(vector<int>) The shape of the output tensor",
   "generated" : 0
 }, { 
   "name" : "min",
   "type" : "float",
   "comment" : "(float, default -1.0) Minimum value of uniform random",
   "generated" : 0
 }, { 
   "name" : "max",
   "type" : "float",
   "comment" : "(float, default 1.0) Maximun value of uniform random",
   "generated" : 0
 }, { 
   "name" : "seed",
   "type" : "int",
   "comment" : "(int, default 0) Random seed used for generating samples. 0 means use a seed generated by the system.",
   "generated" : 0
 }, { 
   "name" : "dtype",
   "type" : "int",
   "comment" : "(int, default 5(FP32)) Output tensor data type",
   "generated" : 0
 } ] 
},{
 "type" : "roi_pool",
 "comment" : "\nROIPool operator\n\nROI Pooling for Faster-RCNN. The link below is a further introduction: \nhttps://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor), the input of ROIPoolOp. The format of input tensor is NCHW. Where N is batch size, C is the number of input channels, H is the height of the feature, and W is the width of the feature.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "ROIs",
   "comment" : "(Tensor), ROIs (Regions of Interest) to pool over. should be a 2-D tensor of shape (num_rois, 5)given as [[batch_id, x1, y1, x2, y2], …]. Where batch_id is the id of the data, (x1, y1) is the top left coordinates, and (x2, y2) is the bottom right coordinates.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor), The output of ROIPoolOp is a 4-D tensor with shape (num_rois, channels, pooled_h, pooled_w).",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Argmax",
   "comment" : "(Tensor), Argmaxes corresponding to indices in X used for gradient computation. Only output if arg “is_test” is false.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "spatial_scale",
   "type" : "float",
   "comment" : "(float, default 1.0), Multiplicative spatial scale factor to translate ROI coords from their input scale to the scale used when pooling.",
   "generated" : 0
 }, { 
   "name" : "pooled_height",
   "type" : "int",
   "comment" : "(int, default 1), The pooled output height.",
   "generated" : 0
 }, { 
   "name" : "pooled_width",
   "type" : "int",
   "comment" : "(int, default 1), The pooled output width.",
   "generated" : 0
 } ] 
},{
 "type" : "softmax",
 "comment" : "\nSoftmax Operator.\n\nThe input of the softmax operator is a 2-D tensor with shape N x K (N is the\nbatch_size, K is the dimension of input feature). The output tensor has the\nsame shape as the input tensor.\n\nFor each row of the input tensor, the softmax operator squashes the\nK-dimensional vector of arbitrary real values to a K-dimensional vector of real\nvalues in the range [0, 1] that add up to 1.\nIt computes the exponential of the given dimension and the sum of exponential\nvalues of all the other dimensions in the K-dimensional vector input.\nThen the ratio of the exponential of the given dimension and the sum of\nexponential values of all the other dimensions is the output of the softmax\noperator.\n\nFor each row $i$ and each column $j$ in Input(X), we have:\n    $$Y[i, j] = \\frac{\\exp(X[i, j])}{\\sum_j(exp(X[i, j])}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor of softmax. 2-D with shape [batch_size, input_feature_dimensions].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "The normalized values with the same shape as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "seq_expand",
 "comment" : "\nSeq Expand Operator.\n\nThis operator expands input(X) according to LOD of input(Y).\nFollowing are cases to better explain how this works:\nCase 1:\n\nGiven 2-level a LoDTensor input(X)\n    X.lod = [[0,       2, 3],\n             [0, 1,    3, 4]]\n    X.data = [a, b, c, d]\n    X.dims = [4, 1]\nand input(Y)\n    Y.lod = [[0,    2,    4],\n             [0, 3, 6, 7, 8]]\nwith condition len(Y.lod[-1]) -1 == X.dims[0]\nthen we get 2-level LoDTensor\n    Out.lod = [[0,                2,    4],\n               [0,       3,       6, 7, 8]]\n    Out.data = [a, a, a, b, b, b, c, d]\n    Out.dims = [8, 1]\n\nCase 2:\n\nGiven a 0-level LoDTensor input(X)\n    X.data = [a, b, c]\n    X.lod = NULL\n    X.dims = [3, 1]\nand input(Y)\n    Y.lod = [[0, 2, 3, 6]]\nwith condition len(Y.lod[-1]) -1 == X.dims[0]\nthen we get 1-level LoDTensor\n    Out.lod = [[0,    2, 3,      6]]\n    Out.data = [a, a, b, c, c, c]\n    Out.dims = [6, 1]\n\nCase 3:\n\nGiven a 0-level LoDTensor input(X)\n    X.data = [[a, b], [c, d], [e, f]]\n    X.lod = NULL\n    X.dims = [3, 2]\nand input(Y)\n    Y.lod = [[0, 2, 3, 6]]\nwith condition len(Y.lod[-1]) -1 == X.dims[0]\nthen we get 1-level LoDTensor\n    Out.lod = [[0,           2,     3,                     6]]\n    Out.data = [[a,b], [a,b] [c,d], [e, f], [e, f], [e, f]]\n    Out.dims = [6, 2]\n\nCase 4:\n\nGiven 2-level a LoDTensor input(X)\n    X.lod = [[0,       2, 3],\n             [0, 1,    3, 4]]\n    X.data = [a, b, c, d]\n    X.dims = [4, 1]\nand input(Y)\n    Y.lod = [[0,    2,    4],\n             [0, 3, 6, 6, 8]]\nwith condition len(Y.lod[-1]) -1 == X.dims[0]\nthen we get 2-level LoDTensor\n    Out.lod = [[0,                2,    4],\n               [0,       3,       6, 6, 8]]\n    Out.data = [a, a, a, b, b, b, d, d]\n    Out.dims = [8, 1]\n\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor or LoDTensor) The input(X) of this operator can be a LoDTensor or a base Tensor.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor)The reference input(Y) of seq_expand op.It must be a LoDTensor with k-level(k>0).The input(X) will be expanded according to LOD of input(Y).The element numbers of last level in input(Y) must be equal to dims[0] of input(X).",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LodTensor)The output of seq_expand op.The lod of output will be as same as input(Y)'s lod.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "sqrt",
 "comment" : "\nSqrt Activation Operator.\n\n$y = \\sqrt{x}$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Sqrt operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Sqrt operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "logical_and",
 "comment" : "logical_and Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = X \\&\\& Y$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) Left hand operand of logical_and operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) Right hand operand of logical_and operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = X \\&\\& Y$$",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "logical_not",
 "comment" : "logical_not Operator\n\nIt operates element-wise on X, and returns the Out. X and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = !X$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) Operand of logical_not operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = !X$$",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "abs",
 "comment" : "\nAbs Activation Operator.\n\n$y = |x|$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Abs operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Abs operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "logical_xor",
 "comment" : "logical_xor Operator\n\nIt operates element-wise on X and Y, and returns the Out. X, Y and Out are N-dim boolean tensors.\nEach element of Out is calculated by $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor) Left hand operand of logical_xor operator",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(LoDTensor) Right hand operand of logical_xor operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) n-dim bool tensor. Each element is $$Out = (X || Y) \\, \\&\\& \\, !(X \\&\\& Y)$$",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "sequence_slice",
 "comment" : "\nSequence slice operator\n\nThe operator crops a subsequence from given sequence with given start offset and subsequence length.\nIt only supports sequence (LoD Tensor with level number is 1).\n- Case:\n    X = [[a1, a2;\n        b1, b2;\n        c1, c2]\n       [d1, d2;\n        e1, e2]]\n    LoD(X) = {{0, 3, 5}}; Dims(X) = (5, 2)\n    Offset = [[0], [1]]; Length = [[2], [1]]\n\n    Out = [[a1, a2;\n            b1, b2]\n            [e1, e2]]\n    LoD(Out) = {{0, 2, 3}}; Dims(Out) = (3, 2)\nNOTE: The first dimension size of input, the size of offset and Length, should be equal. The offset start from 0.\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor), the input of SequenceSliceOp.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Offset",
   "comment" : "(Tensor), a vector<int> to describe the offset of every input sequence for sub sequence item.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Length",
   "comment" : "(Tensor), a vector<int> to describe the length of every input sequence for sub sequence item.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor), the output of SequenceSliceOp.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "hinge_loss",
 "comment" : "\nHingeLoss Operator.\n\nLet x be a logit (prediction) and y be the actual label. The logit can\ntake any values from (-inf, inf), but the labels should be either -1 or 1.\nThen, the hinge loss is computed as follows:\n\n$$\nL_(x, y) = max(1 - y.x, 0) \n$$\n\nNote that the labels passed as input will have values as either 0 or 1.\n\n",
 "inputs" : [ 
 { 
   "name" : "Logits",
   "comment" : "The input value (Logits) of Hinge loss op.Logits is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Labels",
   "comment" : "The target value (Labels) of Hinge loss op.Labels is a 2-D tensor with shape [batch_size, 1].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Loss",
   "comment" : "The output tensor with shape [batch_size, 1] which represents the hinge loss.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "bilinear_tensor_product",
 "comment" : "\nBilinear Tensor Product operator.\nGiven input X and Y, a 3D tensor Weight and a Bias. Each column of the\nOutput is computed by one slice $i = 1, . . . , k$ of the tensor:\n\n$$\nM =  (X W_i) * Y \\\\\nOut_i = \\sum_j {M_j} + Bias_i\n$$\n\nWhere $W_i$ is the $i$-th slice of Input(Weight);\n      $M_j$ is the $j$-th column of $M$;\n      $Out_i$ is the $i$-th column of Output(Out);\n      $Bias_i$ is a column vector, each element of it is equal to\n        the $i$-th element of $Bias$;\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The first input of bilinear_tensor_product operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The second input of bilinear_tensor_product operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "The learnable parameters of bilinear_tensor_product operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "The learnable bias of bilinear_tensor_product operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of bilinear_tensor_product operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "lrn",
 "comment" : "\nLocal Response Normalization Operator.\n\nThis operator comes from the paper:\n<<ImageNet Classification with Deep Convolutional Neural Networks>>.\n\nThe original formula is:\n\n$$\nOutput(i, x, y) = Input(i, x, y) / \\left(\nk + \\alpha \\sum\\limits^{\\min(C, c + n/2)}_{j = \\max(0, c - n/2)}\n(Input(j, x, y))^2\n\\right)^{\\beta}\n$$\n\nFunction implementation:\n\nInputs and outpus are in NCHW format, while input.shape.ndims() equals 4.\nAnd dimensions 0 ~ 3 represent batch size, feature maps, rows,\nand columns, respectively.\n\nInput and Output in the formula above is for each map(i) of one image, and\nInput(i, x, y), Output(i, x, y) represents an element in an image.\n\nC is the number of feature maps of one image. n is a hyper-parameter\nconfigured when operator is initialized. The sum in the denominator\nis the sum of the same positions in the neighboring maps.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input of LRN operator. It must be a 4D tenor with NCHW format.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output of LRN operator, which is also the 4D tensor with NCHW format.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MidOut",
   "comment" : "(Tensor) Middle result of LRN operator. It's computed in forward process and also used in backward process.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "n",
   "type" : "int",
   "comment" : "(int default 5) n is the \"adjacent\" kernel that maps at the same spatial position.",
   "generated" : 0
 }, { 
   "name" : "k",
   "type" : "float",
   "comment" : "(float, default 2.0) k is the bias.",
   "generated" : 0
 }, { 
   "name" : "alpha",
   "type" : "float",
   "comment" : "(float, default 0.0001) alpha is the scale number.",
   "generated" : 0
 }, { 
   "name" : "beta",
   "type" : "float",
   "comment" : "(float, default 0.75) beta is the power number.",
   "generated" : 0
 } ] 
},{
 "type" : "beam_search_decode",
 "comment" : "\nPack the result of Beam search op into SentenceIds and SentenceScores.\n",
 "inputs" : [ 
 { 
   "name" : "Ids",
   "comment" : "(LodTensorArray)score of the candidate words in each step",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Scores",
   "comment" : "(LodTensorArray)score of the candidate words in each step",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "SentenceIds",
   "comment" : "(LodTensor)All possible result sentences of word ids",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SentenceScores",
   "comment" : "(LodTensor)All possible result sentences of word scores",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "assign",
 "comment" : "Assign Operator\n\nOut = X,  when type in [LoDTensor/SelectedRows/LoDTensorArray]\nraise error if the type is not listed above.\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor, SelectedRows or LoDTensorArray) The input variable could be LoDTensor, SelectedRows or LoDTensorArray.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor, SelectedRows or LoDTensorArray) The type of output is the same as input X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "split",
 "comment" : "\nSplit operator\n\nThis operator splits the input tensor into multiple sub-tensors.\n\nExample:\n  Input = [[1,2],\n           [3,4],\n           [5,6]]\n  sections = [2,1]\n  axis = 0\n  Output[0] = [[1,2],\n               [3,4]]\n  Output[1] = [[5,6]]\n\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) Input tensor of the split operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) Output tensors of the split operator.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "sections",
   "type" : "int array",
   "comment" : "(vector<int>) the length of each output along the specified axis.",
   "generated" : 0
 }, { 
   "name" : "num",
   "type" : "int",
   "comment" : "(int, default 0)Number of sub-tensors. This must evenly divide Input.dims()[axis]",
   "generated" : 0
 }, { 
   "name" : "axis",
   "type" : "int",
   "comment" : "(int, default 0) The axis which the input will be splited on.",
   "generated" : 0
 } ] 
},{
 "type" : "chunk_eval",
 "comment" : "\nFor some basics of chunking, please refer to\n‘Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>’.\n\n\nCheckEvalOp computes the precision, recall, and F1-score of chunk detection,\nand supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.\nHere is a NER example of labeling for these tagging schemes:\n\n \t     Li     Ming    works  at  Agricultural   Bank   of    China  in  Beijing.\n  IO:    I-PER  I-PER   O      O   I-ORG          I-ORG  I-ORG I-ORG  O   I-LOC\n  IOB:   B-PER  I-PER   O      O   B-ORG          I-ORG  I-ORG I-ORG  O   B-LOC\n  IOE:   I-PER  E-PER   O      O   I-ORG          I-ORG  I-ORG E-ORG  O   E-LOC\n  IOBES: B-PER  E-PER   O      O   I-ORG          I-ORG  I-ORG E-ORG  O   S-LOC\n\nThere are three chunk types(named entity types) including PER(person), ORG(organization)\nand LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>.\n\nSince the calculations actually use label ids rather than labels, extra attention\nshould be paid when mapping labels to ids to make CheckEvalOp work. The key point\nis that the listed equations are satisfied by ids.\n\n    tag_type = label % num_tag_type\n    chunk_type = label / num_tag_type\n\nwhere `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type`\nis the num of chunk types, and `tag_type` get its value from the following table.\n\n    Scheme Begin Inside End   Single\n     plain   0     -      -     -\n     IOB     0     1      -     -\n     IOE     -     0      1     -\n     IOBES   0     1      2     3\n\nStill use NER as example, assuming the tagging scheme is IOB while chunk types are ORG,\nPER and LOC. To satisfy the above equations, the label map can be like this:\n\n    B-ORG  0\n    I-ORG  1\n    B-PER  2\n    I-PER  3\n    B-LOC  4\n    I-LOC  5\n    O      6\n\nIt’s not hard to verify the equations noting that the num of chunk types\nis 3 and the num of tag types in IOB scheme is 2. For example, the label\nid of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of\nI-LOC is 2, which consistent with the results from the equations.\n",
 "inputs" : [ 
 { 
   "name" : "Inference",
   "comment" : "(Tensor, default: Tensor<int64_t>). Predictions from the network.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor, default: Tensor<int64_t>). The true tag sequences.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Precision",
   "comment" : "(float). The evaluated precision (called positive predictive value) of chunks on the given mini-batch.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Recall",
   "comment" : "(float). The evaluated recall (true positive rate or sensitivity) of chunks on the given mini-batch.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "F1-Score",
   "comment" : "(float). The evaluated F1-Score on the given mini-batch.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "num_chunk_types",
   "type" : "int",
   "comment" : "(int). The number of chunk type. See below for details.",
   "generated" : 0
 }, { 
   "name" : "chunk_scheme",
   "type" : "string",
   "comment" : "(string, default IOB). The labeling scheme indicating how to encode the chunks. Must be IOB, IOE, IOBES or plain. See below for details.",
   "generated" : 0
 }, { 
   "name" : "excluded_chunk_types",
   "type" : "int array",
   "comment" : "(list<int>) A list including chunk type ids indicating chunk types that are not counted. See below for details.",
   "generated" : 0
 } ] 
},{
 "type" : "sigmoid",
 "comment" : "\nSigmoid Activation Operator\n\n$$y = \\frac{1}{1 + e^{-x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Sigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Sigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "squared_l2_distance",
 "comment" : "\nSquaredL2Distance operator\n\nThis operator will cacluate the squared L2 distance for the input and \nthe target. Number of distance value will be equal to the first dimension \nof input. First dimension of the target could be equal to the input or to 1. \nIf the first dimension of target is 1, the operator will broadcast target's \nfirst dimension to input's first dimension. During backward propagation, \nthe user can decide whether to calculate the gradient of the input or \nthe target or both.\n\nBoth the input X and Y can carry the LoD (Level of Details) information. \nHowever, the output only shares the LoD information with input X.\n    ",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) Input of SquaredL2DistanceOp.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor) Target of SquaredL2DistanceOp.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "sub_result",
   "comment" : "(Tensor) Buffering subtraction result which will be reused in backward.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "Out",
   "comment" : "(Tensor) Squared l2 distance between input and target.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "relu",
 "comment" : "\nRelu Activation Operator.\n\n$y = \\max(x, 0)$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Relu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Relu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "fetch",
 "comment" : "\nFetch Operator.\n\nIt should not be configured by users directly.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input of fetch op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of fetch op",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "col",
   "type" : "int",
   "comment" : "(int) The column of fetch",
   "generated" : 0
 } ] 
},{
 "type" : "while",
 "comment" : "\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "A set of variables, which are required by operators inside the block of While Op.",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "Condition",
   "comment" : "(Bool) An scalar. When it's False, the While Op will be terminated.",
   "duplicable" : 1,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "A set of variables, which will be assigned with values generated by the operators inside the block of While Op.",
   "duplicable" : 1,
   "intermediate" : 0
 }, { 
   "name" : "StepScopes",
   "comment" : "(StepScopeVar) A vector of local scope, which size equals the step number of While Op. The i'th scope storages temporary variables generated in the i'th step.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "step_block",
   "type" : "block id",
   "comment" : "The step block inside WhileOp",
   "generated" : 0
 } ] 
},{
 "type" : "proximal_adagrad",
 "comment" : "\nProximal Adagrad Optimizer.\n\nOptimizer that implements the proximal adagrad algorithm:\n\n$$\nmoment = moment + grad * grad \\\\\nprox\\_param = param - learning\\_rate * grad * (1 / \\sqrt{moment}) \\\\\nparam = sign(prox\\_param) / (1 + learning\\_rate * l2) *\n        \\max(|prox\\_param| - learning\\_rate * l1 , 0)\n$$\n\nThe paper that proposed Proximal GD: \n(http://papers.nips.cc/paper/3793-efficient-learning-using-forward-backward-splitting.pdf)\nHere, we use the adagrad learning rate as specified here: \n(http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf)\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor, default Tensor<float>) Input parameter that has to be updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Moment",
   "comment" : "(Tensor, default Tensor<float>) Moment parameter that has to be updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor, default Tensor<float>) Input gradient of the parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor, default Tensor<float>) The learning rate should be a tensor of size 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output updated parameter value.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MomentOut",
   "comment" : "(Tensor) Output updated moment value.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "l1",
   "type" : "float",
   "comment" : "(float, default 0.0) L1 regularization strength.",
   "generated" : 0
 }, { 
   "name" : "l2",
   "type" : "float",
   "comment" : "(float, default 0.0) L2 regularization strength.",
   "generated" : 0
 } ] 
},{
 "type" : "minus",
 "comment" : "\nMinus Operator.\n\nEquation:\n\n    $Out = X - Y$\n\nBoth the input `X` and `Y` can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input `X`.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The left tensor of minus operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The right tensor of minus operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output tensor of minus operator.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "cos_sim",
 "comment" : "\nCosine Similarity Operator.\n\n$Out = X^T * Y / (\\sqrt{X^T * X} * \\sqrt{Y^T * Y})$\n\nThe input X and Y must have the same shape, except that the 1st dimension\nof input Y could be just 1 (different from input X), which will be\nbroadcasted to match the shape of input X before computing their cosine\nsimilarity.\n\nBoth the input X and Y can carry the LoD (Level of Details) information,\nor not. But the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The 1st input of cos_sim op.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "The 2nd input of cos_sim op.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The output of cos_sim op.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "XNorm",
   "comment" : "Norm of the first input, reduced along the 1st dimension.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "YNorm",
   "comment" : "Norm of the second input, reduced along the 1st dimension.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [  ] 
},{
 "type" : "precision_recall",
 "comment" : "\nPrecision Recall Operator.\n\nWhen given Input(Indices) and Input(Labels), this operator can be used\nto compute various metrics including:\n1. macro average precision\n2. macro average recall\n3. macro f1 score\n4. micro average precision\n5. micro average recall\n6. micro f1 score\n\nTo compute the above metrics, we need to do statistics for true positives,\nfalse positives and false negatives. Here the count of true negatives is not\nnecessary, but counting it may provide potential usage and the cost is\ntrivial, so the operator also provides the count of true negatives.\n\nWe define state as a 2-D tensor with shape [class_number, 4]. Each row of a\nstate contains statistic variables for corresponding class. Layout of each row\nis: TP(true positives), FP(false positives), TN(true negatives),\nFN(false negatives). If Input(Weights) is provided, TP, FP, TN, FN will be\ncalculated by given weight instead of the instance count.\n\nThis operator also supports metrics computing for cross-batch situation. To\nachieve this, Input(StatesInfo) should be provided. State of current batch\ndata will be accumulated to Input(StatesInfo) and Output(AccumStatesInfo)\nis the accumulation state.\n\nOutput(BatchMetrics) is metrics of current batch data while\nOutput(AccumStatesInfo) is metrics of accumulation data.\n\n",
 "inputs" : [ 
 { 
   "name" : "MaxProbs",
   "comment" : "(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, where N is the batch size. Each row contains the max probability of an instance which computed by the previous top_k (k=1) operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Indices",
   "comment" : "(Tensor, default Tensor<int>) A 2-D tensor with shape N x 1, where N is the batch size. Each row contains the corresponding index which computed by the previous top_k (k=1) operator.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Labels",
   "comment" : "(Tensor, default Tensor<int>) A 2-D tensor with shape N x 1, where N is the batch size. Each element is a label and the value should be in [0, class_number - 1].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weights",
   "comment" : "(Tensor, default Tensor<float>) A 2-D tensor with shape N x 1, where N is the batch size. This input is optional. If provided, weight of instance would be considered when computing metrics.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "StatesInfo",
   "comment" : "(Tensor, default Tensor<int>) A 2-D tensor with shape D x 4, where D is the number of classes. This input is optional. If provided, current state will be accumulated to this state and the accumulation state will be the output state.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "BatchMetrics",
   "comment" : "(Tensor, default Tensor<float>) A 1-D tensor with shape {6}. This output tensor contains metrics for current batch data. The layout is [macro average precision, macro average recall, macro f1 score, micro average precision, micro average recall, micro f1 score].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AccumMetrics",
   "comment" : "(Tensor, default Tensor<float>) A 1-D tensor with shape {6}. This output tensor contains metrics for accumulated data. The layout is [macro average precision, macro average recall, macro f1 score, micro average precision, micro average recall, micro f1 score].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "AccumStatesInfo",
   "comment" : "(Tensor, default Tensor<float>) A 2-D tensor with shape D x 4, where D is equal to class number. This output tensor contains accumulated state variables used to compute metrics. The layout for each class is [true positives, false positives, true negatives, false negatives].",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "class_number",
   "type" : "int",
   "comment" : "(int) Number of classes to be evaluated.",
   "generated" : 0
 } ] 
},{
 "type" : "batch_norm",
 "comment" : "\nBatch Normalization.\n\nBatch Norm has been implemented as discussed in the paper:\nhttps://arxiv.org/pdf/1502.03167.pdf\nCan be used as a normalizer function for conv2d and fully_connected operations.\nThe required data format for this layer is one of the following:\n1. NHWC `[batch, in_height, in_width, in_channels]`\n2. NCHW `[batch, in_channels, in_height, in_width]`\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "The input tensor",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Scale",
   "comment" : "Scale is a 1-dimensional tensor of size C that is applied to the output",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "Bias is a 1-dimensional tensor of size C that is applied to the output",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Mean",
   "comment" : "The global mean (for training) or estimated mean (for testing)",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Variance",
   "comment" : "The global variance (for training) or estimated Variance (for testing)",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "result after normalization",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "MeanOut",
   "comment" : "Share memory with Mean. Store the global mean when training",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "VarianceOut",
   "comment" : "Share memory with Variance. Store the global Variance when training",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SavedMean",
   "comment" : "Mean of the current mini batch, will apply to output when training",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "SavedVariance",
   "comment" : "Variance of the current mini batch, will apply to output when training",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "is_test",
   "type" : "bool",
   "comment" : "",
   "generated" : 0
 }, { 
   "name" : "momentum",
   "type" : "float",
   "comment" : "",
   "generated" : 0
 }, { 
   "name" : "epsilon",
   "type" : "float",
   "comment" : "",
   "generated" : 0
 }, { 
   "name" : "tensor_format",
   "type" : "string",
   "comment" : "",
   "generated" : 0
 } ] 
},{
 "type" : "read_from_array",
 "comment" : "\nReadFromArray Operator.\n\nRead a LoDTensor from a LoDTensor Array.\n\nAssume $T$ is LoDTensor, $i$ is the subscript of the array, and $A$ is the array. The\nequation is\n\n$$T = A[i]$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(TensorArray) the array will be read from.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "I",
   "comment" : "(Tensor) the subscript index in tensor array. The number of element should be 1",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor) the tensor will be read from.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "softplus",
 "comment" : "\nSoftplus Activation Operator.\n\n$y = \\ln(1 + e^{x})$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Softplus operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Softplus operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "accuracy",
 "comment" : "\nAccuracy Operator. \n\nIt will print accuracy rate for classification.\nThe accuracy is calculated as follows:\n\n$$accuracy = \\frac{NumOfCorrectPredicts}{NumOfAllSamples}$$\n\nBoth the input Out and Label can carry the LoD (Level of Details)\ninformation, or not. But the output only shares the LoD information \nwith the input Out(Inference).\n\n",
 "inputs" : [ 
 { 
   "name" : "Out",
   "comment" : "The network output of topk (inferences)",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Indices",
   "comment" : "The the network output of topk (indices)",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "Label of the training data",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Accuracy",
   "comment" : "The accuracy of current batch",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Correct",
   "comment" : "The correct samples count of current batch",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Total",
   "comment" : "The samples count of current batch",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "conv_shift",
 "comment" : "\nConvShift Operator.\n\nA layer for circular convolution of two vectors,\nas used in the Neural Turing Machine: https://arxiv.org/abs/1410.5401\n\nThe equation is:\n\n$$Out[i] = \\sum_{j=-(N-1)/2}^{(N-1)/2} X_{i+j} * Y_{j}$$\n\nwhere X's index is computed modulo M, and Y's index is computed modulo N.\n\nBoth inputs X and Y can carry LoD (Level of Details) information.\nHowever, the output only shares the LoD information with input X.\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape B x M, where B is the batch size and M is the data dimension.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Y",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape B x N, where B is the batch size and N is the data dimension. N must be odd.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor, default Tensor<float>), a 2-D tensor with shape B x M, i.e., the same shape as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "nce",
 "comment" : "\nCompute and return the noise-contrastive estimation training loss.\nSee [Noise-contrastive estimation: A new estimation principle for unnormalized statistical models](http://www.jmlr.org/proceedings/papers/v9/gutmann10a/gutmann10a.pdf).\nBy default this operator uses a uniform distribution for sampling.\n",
 "inputs" : [ 
 { 
   "name" : "Input",
   "comment" : "(Tensor) A tensor of shape [batch_size, dim].",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(Tensor) A tensor of shape [batch_size, num_true_class]. 'num_true_class' is the number of target classes in each sample.The number of target classes per sample should be same. If you have a variable number of target classes, you can pad them out to a constant number by either repeating them or by padding with an otherwise unused class.)",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Weight",
   "comment" : "(Tensor) A tensor of shape [num_class, dim]. 'num_class' is the total number of class.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Bias",
   "comment" : "(Tensor) A tensor of shape [num_class, 1]. 'num_class' is the total number of class. It is a dispensable input.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SampleWeight",
   "comment" : "(Tensor) A tensor of shape [batch_size, 1] storing a weight for each sample. And it is a dispensable input. The default value of sample is 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Cost",
   "comment" : "(Tensor) A tensor of shape [batch_size, 1]. Cost of samples.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SampleLogits",
   "comment" : "An intermediate tensor of shape[batch_size, num_neg_samples + num_pos_samples].This tensor is output of forward kernel and used in backward kernel to compute grads.Given X is  the dot product of input tensor and sampled labels' weights.Then 'SampleLogits' is sigmoid(X).",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "SampleLabels",
   "comment" : "An intermediate tensor of shape[batch_size, num_neg_samples + num_pos_samples].This tensor is output of forward kernel and used in backward kernel to compute grads.",
   "duplicable" : 0,
   "intermediate" : 1
 } ], 
 "attrs" : [ 
 { 
   "name" : "num_total_classes",
   "type" : "int",
   "comment" : "Total number of classes in all samples.",
   "generated" : 0
 }, { 
   "name" : "num_neg_samples",
   "type" : "int",
   "comment" : "The number of negative classes. The default value is 10.",
   "generated" : 0
 }, { 
   "name" : "custom_neg_classes",
   "type" : "int array",
   "comment" : "This attribute only be used in unitest. Classes in this list wiil be used as negative classes for every samples. Under normal conditions, user should avoid setting this attribute.",
   "generated" : 0
 } ] 
},{
 "type" : "linear_chain_crf",
 "comment" : "\nLinearChainCRF Operator.\n\nConditional Random Field defines an undirected probabilistic graph with nodes\ndenoting random variables and edges denoting dependencies between these\nvariables. CRF learns the conditional probability $P(Y|X)$, where\n$X = (x_1, x_2, ... , x_n)$ are structured inputs and\n$Y = (y_1, y_2, ... , y_n)$ are labels for the inputs.\n\nLinear chain CRF is a special case of CRF that is useful for sequence labeling\ntask. Sequence labeling tasks do not assume a lot of conditional\nindependences among inputs. The only constraint they impose is that the input\nand output must be linear sequences. Thus, the graph of such a CRF is a simple\nchain or a line, which results in the linear chain CRF.\n\nThis operator implements the Forward-Backward algorithm for the linear chain\nCRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and\nhttp://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.\n\nEquation:\n1. Denote Input(Emission) to this operator as $x$ here.\n2. The first D values of Input(Transition) to this operator are for starting\nweights, denoted as $a$ here.\n3. The next D values of Input(Transition) of this operator are for ending\nweights, denoted as $b$ here.\n4. The remaning values of Input(Transition) are for transition weights,\ndenoted as $w$ here.\n5. Denote Input(Label) as $s$ here.\n\nThe probability of a sequence $s$ of length $L$ is defined as:\n$$P(s) = (1/Z) \\exp(a_{s_1} + b_{s_L}\n                + \\sum_{l=1}^L x_{s_l}\n                + \\sum_{l=2}^L w_{s_{l-1},s_l})$$\n\nwhere $Z$ is a normalization value so that the sum of $P(s)$ over\nall possible sequences is 1, and $x$ is the emission feature weight\nto the linear chain CRF.\n\nFinally, the linear chain CRF operator outputs the logarithm of the conditional\nlikelihood of each training sample in a mini-batch.\n\nNOTE:\n1. The feature function for a CRF is made up of the emission features and the\ntransition features. The emission feature weights are NOT computed in\nthis operator. They MUST be computed first before this operator is called.\n\n2. Because this operator performs global normalization over all possible\nsequences internally, it expects UNSCALED emission feature weights.\nPlease do not call this op with the emission feature being output of any\nnonlinear activation.\n\n3. The 2nd dimension of Input(Emission) MUST be equal to the tag number.\n\n",
 "inputs" : [ 
 { 
   "name" : "Emission",
   "comment" : "(LoDTensor, default LoDTensor<float>) A 2-D LoDTensor with shape [N x D], where N is the size of the mini-batch and D is the total tag number. The unscaled emission weight matrix for the linear chain CRF. ",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Transition",
   "comment" : "(Tensor, default Tensor<float>) A 2-D Tensor with shape [(D + 2) x D]. The learnable parameter for the linear_chain_crf operator. See more details in the operator's comments.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Label",
   "comment" : "(LoDTensor, default LoDTensor<int64_t>) A LoDTensor with shape [N x 1], where N is the total element number in a mini-batch. The ground truth.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Alpha",
   "comment" : "(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. The forward vectors for the entire batch. Denote it as $lpha$. $lpha$ is a memo table used to calculate the normalization factor in CRF. $lpha[k, v]$ stores the unnormalized probabilites of all possible unfinished sequences of tags that end at position $k$ with tag $v$. For each $k$, $lpha[k, v]$ is a vector of length $D$ with a component for each tag value $v$. This vector is called a forward vecotr and will also be used in backward computations.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "EmissionExps",
   "comment" : "(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. The exponentials of Input(Emission). This is an intermediate computational result in forward computation, and will be reused in backward computation.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "TransitionExps",
   "comment" : "(Tensor, default Tensor<float>) A 2-D Tensor with shape [(D + 2) x D]. The exponentials of Input(Transition). This is an intermediate computational result in forward computation, and will be reused in backward computation.",
   "duplicable" : 0,
   "intermediate" : 1
 }, { 
   "name" : "LogLikelihood",
   "comment" : "(Tensor, default Tensor<float>) The logarithm of the conditional likelihood of each training sample in a mini-batch. This is a 2-D tensor with shape [S x 1], where S is the sequence number in a mini-batch. Note: S is equal to the sequence number in a mini-batch. The output is no longer a LoDTensor.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "logsigmoid",
 "comment" : "\nLogsigmoid Activation Operator\n\n$$y = \\log \\frac{1}{1 + e^{-x}}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of LogSigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of LogSigmoid operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "row_conv",
 "comment" : "\nRow-convolution Operator.\n\nThe row convolution is called lookahead convolution.  This operator was \nintroduced in the following paper for DeepSpeech2:\nhttp://www.cs.cmu.edu/~dyogatam/papers/wang+etal.iclrworkshop2016.pdf \n\nThe main motivation is that a bidirectional RNN, useful in DeepSpeech \nlike speech models, learns representation for a sequence by performing a \nforward and a backward pass through the entire sequence. However, unlike \nunidirectional RNNs, bidirectional RNNs are challenging to deploy in an online\nand low-latency setting. The lookahead convolution incorporates information \nfrom future subsequences in a computationally efficient manner to improve \nunidirectional recurrent neural networks. The row convolution operator is \ndifferent from the 1D sequence convolution, and is computed as follows:\n\nGiven an input sequence $in$ of length $t$ and input dimension $d$, \nand a filter ($W$) of size $context \\times d$, \nthe output sequence is convolved as:\n\n$$\nout_{i, :} = \\sum_{j=i}^{i + context} in_{j,:} \\dot W_{i-j, :}\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(LoDTensor), the input(X) is a LodTensor, which supports variable time-length input sequences. The underlying tensor in this LoDTensor is a matrix with shape (T x N), where T is the total time steps in this mini-batch and N is the input data dimension.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Filter",
   "comment" : "(Tensor), the input(Filter) is a learnable parameter. It is a 2-D tensor with shape (future_context x N), where, future_context is the future context length and N is the data dimension.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(LoDTensor), the output(Out) is a LodTensor, which supports variable time-length input sequences. The underlying tensor in this LodTensor is a matrix with shape T x N, i.e., the same shape as X.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "exp",
 "comment" : "\nExp Activation Operator.\n\n$y = e^x$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Exp operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Exp operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "soft_relu",
 "comment" : "\nSoftRelu Activation Operator.\n\n$y = \\ln(1 + \\exp(\\max(\\min(x, threshold), threshold))$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of SoftRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of SoftRelu operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "threshold",
   "type" : "float",
   "comment" : "The threshold value of SoftRelu",
   "generated" : 0
 } ] 
},{
 "type" : "softshrink",
 "comment" : "\nSoftshrink Activation Operator.\n\n$$\ny = \\begin{cases} \n    x - \\lambda, \\text{if } x > \\lambda \\\\\n    x + \\lambda, \\text{if } x < -\\lambda \\\\\n    0,  \\text{otherwise}\n    \\end{cases}\n$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Softshrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Softshrink operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "lambda",
   "type" : "float",
   "comment" : "non-negative offset",
   "generated" : 0
 } ] 
},{
 "type" : "maxout",
 "comment" : "\nMaxOut Operator.\n\nAssumed the input shape is (N, Ci, H, W).\nThe output shape is (N, Co, H, W).\nThen $Co = Ci / groups$ and the operator formula is as follows:\n\n$$\ny_{si+j} = \\max_k x_{gsi + sk + j} \\\\\ng = groups \\\\\ns = \\frac{input.size}{num\\_channels} \\\\\n0 \\le i < \\frac{num\\_channels}{groups} \\\\\n0 \\le j < s \\\\\n0 \\le k < groups\n$$\n\nPlease refer to Paper:\n  - Maxout Networks: http://www.jmlr.org/proceedings/papers/v28/goodfellow13.pdf\n  - Multi-digit Number Recognition from Street View \\\n    Imagery using Deep Convolutional Neural Networks: \\\n    https://arxiv.org/pdf/1312.6082v4.pdf\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "(Tensor) The input tensor of maxout operator. The format of input tensor is NCHW. Where N is batch size, C is the number of channels, H and W is the height and width of feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Out",
   "comment" : "(Tensor) The output tensor of maxout operator.The format of output tensor is also NCHW.Where N is batch size, C is the number of channels, H and W is the height and width of feature.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "groups",
   "type" : "int",
   "comment" : "\"Specifies how many groups the input tensor will be split\"\n        \"in the channel dimension. And the number of output channel is \"\n        \"the number of channels divided by groups..\"\n        ",
   "generated" : 0
 } ] 
},{
 "type" : "ftrl",
 "comment" : "\nFTRL (Follow The Regularized Leader) Operator.\n\nOptimizer that implements the FTRL algorithm:\n\n$$\nnew\\_accum = squared\\_accum + grad^2 \\\\\nif (lr\\_power == -0.5) {\n   linear\\_accum += grad - (\\surd(new\\_accum) - \\surd(squared\\_accum)) /\n                   (learning\\_rate * param) \\\\\n} else {\n   linear\\_accum += grad -\n                  (new\\_accum^{-lr\\_power} - accum^{-lr\\_power}) /\n                  (learning\\_rate * param) \\\\\n}\n\nx = (l1 * sign(linear\\_accum) - linear\\_accum)\nif (lr\\_power == -0.5) {\n   y = \\frac{\\surd(new\\_accum)}{learning\\_rate} + (2 * l2) \\\\\n   pre\\_shrink = \\frac{x}{y} \\\\\n   param = (abs(linear\\_accum) > l1).select(pre\\_shrink, 0.0) \\\\\n} else {\n   y = \\frac{new\\_accum^{-lr\\_power}}{learning\\_rate} + (2 * l2) \\\\\n   pre\\_shrink = \\frac{x}{y} \\\\\n   param = (abs(linear\\_accum) > l1).select(pre\\_shrink, 0.0) \\\\\n}\nsquared\\_accum += grad^2;\n$$\n\nThe paper that proposed Follow The Regularized Leader (FTRL):\n(https://www.eecs.tufts.edu/~dsculley/papers/ad-click-prediction.pdf)\n\n",
 "inputs" : [ 
 { 
   "name" : "Param",
   "comment" : "(Tensor, default Tensor<float>) Input parameter value that has to be updated.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SquaredAccumulator",
   "comment" : "(Tensor, default Tensor<float>) Accumulator that accumulates squared gradients.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LinearAccumulator",
   "comment" : "(Tensor, default Tensor<float>) Accumulator that accumulates linear gradients.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "Grad",
   "comment" : "(Tensor, default Tensor<float>) Input gradient of the parameter.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LearningRate",
   "comment" : "(Tensor, default Tensor<float>) The learning rate should be a tensor of size 1.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "ParamOut",
   "comment" : "(Tensor) Output updated parameter value.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "SquaredAccumOut",
   "comment" : "(Tensor) Output accumulated squared gradients.",
   "duplicable" : 0,
   "intermediate" : 0
 }, { 
   "name" : "LinearAccumOut",
   "comment" : "(Tensor) Output accumulated linear gradients.",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [ 
 { 
   "name" : "l1",
   "type" : "float",
   "comment" : "(float, default 0.0) L1 regularization strength.",
   "generated" : 0
 }, { 
   "name" : "l2",
   "type" : "float",
   "comment" : "(float, default 0.0) L2 regularization strength.",
   "generated" : 0
 }, { 
   "name" : "lr_power",
   "type" : "float",
   "comment" : "(float, default -0.5f) Learning Rate Power.",
   "generated" : 0
 } ] 
},{
 "type" : "round",
 "comment" : "\nRound Activation Operator.\n\n$y = [x]$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Round operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Round operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
},{
 "type" : "softsign",
 "comment" : "\nSoftsign Activation Operator.\n\n$$y = \\frac{x}{1 + |x|}$$\n\n",
 "inputs" : [ 
 { 
   "name" : "X",
   "comment" : "Input of Softsign operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "outputs" : [ 
 { 
   "name" : "Y",
   "comment" : "Output of Softsign operator",
   "duplicable" : 0,
   "intermediate" : 0
 } ], 
 "attrs" : [  ] 
}]