未验证 提交 ce60bbf5 编写于 作者: Y Yu Yang 提交者: GitHub

Merge pull request #11314 from typhoonzero/fix_api_reference_docs

Fix api reference docs
...@@ -91,32 +91,31 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -91,32 +91,31 @@ class ChunkEvalOpMaker : public framework::OpProtoAndCheckerMaker {
"(int64_t). The number of chunks both in Inference and Label on the " "(int64_t). The number of chunks both in Inference and Label on the "
"given mini-batch."); "given mini-batch.");
AddAttr<int>("num_chunk_types", AddAttr<int>("num_chunk_types",
"(int). The number of chunk type. See below for details."); "The number of chunk type. See the description for details.");
AddAttr<std::string>( AddAttr<std::string>("chunk_scheme",
"chunk_scheme", "The labeling scheme indicating "
"(string, default IOB). The labeling scheme indicating " "how to encode the chunks. Must be IOB, IOE, IOBES or "
"how to encode the chunks. Must be IOB, IOE, IOBES or plain. See below " "plain. See the description"
"for details.") "for details.")
.SetDefault("IOB"); .SetDefault("IOB");
AddAttr<std::vector<int>>("excluded_chunk_types", AddAttr<std::vector<int>>("excluded_chunk_types",
"(list<int>) A list including chunk type ids " "A list including chunk type ids "
"indicating chunk types that are not counted. " "indicating chunk types that are not counted. "
"See below for details.") "See the description for details.")
.SetDefault(std::vector<int>{}); .SetDefault(std::vector<int>{});
AddComment(R"DOC( AddComment(R"DOC(
For some basics of chunking, please refer to For some basics of chunking, please refer to
‘Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>’. 'Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>'.
ChunkEvalOp computes the precision, recall, and F1-score of chunk detection,
CheckEvalOp computes the precision, recall, and F1-score of chunk detection,
and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes. and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.
Here is a NER example of labeling for these tagging schemes: Here is a NER example of labeling for these tagging schemes:
Li Ming works at Agricultural Bank of China in Beijing. Li Ming works at Agricultural Bank of China in Beijing.
IO: I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
IOB: B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
IOE: I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC IOE I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
IOBES: B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC IOBES B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
There are three chunk types(named entity types) including PER(person), ORG(organization) There are three chunk types(named entity types) including PER(person), ORG(organization)
and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>. and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>.
...@@ -124,31 +123,31 @@ and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chun ...@@ -124,31 +123,31 @@ and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chun
Since the calculations actually use label ids rather than labels, extra attention Since the calculations actually use label ids rather than labels, extra attention
should be paid when mapping labels to ids to make CheckEvalOp work. The key point should be paid when mapping labels to ids to make CheckEvalOp work. The key point
is that the listed equations are satisfied by ids. is that the listed equations are satisfied by ids.
tag_type = label % num_tag_type tag_type = label % num_tag_type
chunk_type = label / num_tag_type chunk_type = label / num_tag_type
where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type` where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type`
is the num of chunk types, and `tag_type` get its value from the following table. is the num of chunk types, and `tag_type` get its value from the following table.
Scheme Begin Inside End Single Scheme Begin Inside End Single
plain 0 - - - plain 0 - - -
IOB 0 1 - - IOB 0 1 - -
IOE - 0 1 - IOE - 0 1 -
IOBES 0 1 2 3 IOBES 0 1 2 3
Still use NER as example, assuming the tagging scheme is IOB while chunk types are ORG, Still use NER as example, assuming the tagging scheme is IOB while chunk types are ORG,
PER and LOC. To satisfy the above equations, the label map can be like this: PER and LOC. To satisfy the above equations, the label map can be like this:
B-ORG 0 B-ORG 0
I-ORG 1 I-ORG 1
B-PER 2 B-PER 2
I-PER 3 I-PER 3
B-LOC 4 B-LOC 4
I-LOC 5 I-LOC 5
O 6 O 6
Its not hard to verify the equations noting that the num of chunk types It's not hard to verify the equations noting that the num of chunk types
is 3 and the num of tag types in IOB scheme is 2. For example, the label is 3 and the num of tag types in IOB scheme is 2. For example, the label
id of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of id of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of
I-LOC is 2, which consistent with the results from the equations. I-LOC is 2, which consistent with the results from the equations.
......
...@@ -156,7 +156,7 @@ Parameters(strides, paddings) are two elements. These two elements represent hei ...@@ -156,7 +156,7 @@ Parameters(strides, paddings) are two elements. These two elements represent hei
and width, respectively. and width, respectively.
The input(X) size and output(Out) size may be different. The input(X) size and output(Out) size may be different.
Example: For an example:
Input: Input:
Input shape: $(N, C_{in}, H_{in}, W_{in})$ Input shape: $(N, C_{in}, H_{in}, W_{in})$
Filter shape: $(C_{in}, C_{out}, H_f, W_f)$ Filter shape: $(C_{in}, C_{out}, H_f, W_f)$
......
...@@ -76,9 +76,9 @@ class CosSimOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -76,9 +76,9 @@ class CosSimOpMaker : public framework::OpProtoAndCheckerMaker {
.AsIntermediate(); .AsIntermediate();
AddComment(R"DOC( AddComment(R"DOC(
Cosine Similarity Operator. **Cosine Similarity Operator**
$Out = X^T * Y / (\sqrt{X^T * X} * \sqrt{Y^T * Y})$ $Out = \frac{X^T * Y}{(\sqrt{X^T * X} * \sqrt{Y^T * Y})}$
The input X and Y must have the same shape, except that the 1st dimension The input X and Y must have the same shape, except that the 1st dimension
of input Y could be just 1 (different from input X), which will be of input Y could be just 1 (different from input X), which will be
......
...@@ -53,21 +53,18 @@ sequence of observed tags. ...@@ -53,21 +53,18 @@ sequence of observed tags.
The output of this operator changes according to whether Input(Label) is given: The output of this operator changes according to whether Input(Label) is given:
1. Input(Label) is given: 1. Input(Label) is given:
This happens in training. This operator is used to co-work with the chunk_eval
This happens in training. This operator is used to co-work with the chunk_eval operator.
operator. When Input(Label) is given, the crf_decoding operator returns a row vector
with shape [N x 1] whose values are fixed to be 0, indicating an incorrect
When Input(Label) is given, the crf_decoding operator returns a row vector prediction, or 1 indicating a tag is correctly predicted. Such an output is the
with shape [N x 1] whose values are fixed to be 0, indicating an incorrect input to chunk_eval operator.
prediction, or 1 indicating a tag is correctly predicted. Such an output is the
input to chunk_eval operator.
2. Input(Label) is not given: 2. Input(Label) is not given:
This is the standard decoding process.
This is the standard decoding process.
The crf_decoding operator returns a row vector with shape [N x 1] whose values The crf_decoding operator returns a row vector with shape [N x 1] whose values
range from 0 to maximum tag number - 1. Each element indicates an index of a range from 0 to maximum tag number - 1, Each element indicates an index of a
predicted tag. predicted tag.
)DOC"); )DOC");
} }
......
...@@ -68,15 +68,16 @@ class IOUSimilarityOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -68,15 +68,16 @@ class IOUSimilarityOpMaker : public framework::OpProtoAndCheckerMaker {
"representing pairwise iou scores."); "representing pairwise iou scores.");
AddComment(R"DOC( AddComment(R"DOC(
IOU Similarity Operator. **IOU Similarity Operator**
Computes intersection-over-union (IOU) between two box lists. Computes intersection-over-union (IOU) between two box lists.
Box list 'X' should be a LoDTensor and 'Y' is a common Tensor, Box list 'X' should be a LoDTensor and 'Y' is a common Tensor,
boxes in 'Y' are shared by all instance of the batched inputs of X. boxes in 'Y' are shared by all instance of the batched inputs of X.
Given two boxes A and B, the calculation of IOU is as follows: Given two boxes A and B, the calculation of IOU is as follows:
$$ $$
IOU(A, B) = IOU(A, B) =
\frac{area(A\cap B)}{area(A)+area(B)-area(A\cap B)} \\frac{area(A\\cap B)}{area(A)+area(B)-area(A\\cap B)}
$$ $$
)DOC"); )DOC");
......
...@@ -84,6 +84,7 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and ...@@ -84,6 +84,7 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details. http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
Equation: Equation:
1. Denote Input(Emission) to this operator as $x$ here. 1. Denote Input(Emission) to this operator as $x$ here.
2. The first D values of Input(Transition) to this operator are for starting 2. The first D values of Input(Transition) to this operator are for starting
weights, denoted as $a$ here. weights, denoted as $a$ here.
...@@ -106,6 +107,7 @@ Finally, the linear chain CRF operator outputs the logarithm of the conditional ...@@ -106,6 +107,7 @@ Finally, the linear chain CRF operator outputs the logarithm of the conditional
likelihood of each training sample in a mini-batch. likelihood of each training sample in a mini-batch.
NOTE: NOTE:
1. The feature function for a CRF is made up of the emission features and the 1. The feature function for a CRF is made up of the emission features and the
transition features. The emission feature weights are NOT computed in transition features. The emission feature weights are NOT computed in
this operator. They MUST be computed first before this operator is called. this operator. They MUST be computed first before this operator is called.
......
...@@ -184,34 +184,32 @@ Long-Short Term Memory (LSTM) Operator. ...@@ -184,34 +184,32 @@ Long-Short Term Memory (LSTM) Operator.
The defalut implementation is diagonal/peephole connection The defalut implementation is diagonal/peephole connection
(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows: (https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:
$$ $$ i_t = \\sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i) $$
i_t = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i) \\
f_t = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f) \\ $$ f_t = \\sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f) $$
\tilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c) \\ $$ \\tilde{c_t} = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c) $$
o_t = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o) \\ $$ o_t = \\sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o) $$
c_t = f_t \odot c_{t-1} + i_t \odot \tilde{c_t} \\ $$ c_t = f_t \\odot c_{t-1} + i_t \\odot \\tilde{c_t} $$
h_t = o_t \odot act_h(c_t) $$ h_t = o_t \\odot act_h(c_t) $$
$$
where the W terms denote weight matrices (e.g. $W_{xi}$ is the matrix - W terms denote weight matrices (e.g. $W_{xi}$ is the matrix
of weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$ of weights from the input gate to the input), $W_{ic}, W_{fc}, W_{oc}$
are diagonal weight matrices for peephole connections. In our implementation, are diagonal weight matrices for peephole connections. In our implementation,
we use vectors to reprenset these diagonal weight matrices. The b terms we use vectors to reprenset these diagonal weight matrices.
denote bias vectors ($b_i$ is the input gate bias vector), $\sigma$ - The b terms denote bias vectors ($b_i$ is the input gate bias vector).
is the non-line activations, such as logistic sigmoid function, and - $\sigma$ is the non-line activations, such as logistic sigmoid function.
$i, f, o$ and $c$ are the input gate, forget gate, output gate, - $i, f, o$ and $c$ are the input gate, forget gate, output gate,
and cell activation vectors, respectively, all of which have the same size as and cell activation vectors, respectively, all of which have the same size as
the cell output activation vector $h$. the cell output activation vector $h$.
- The $\odot$ is the element-wise product of the vectors.
The $\odot$ is the element-wise product of the vectors. $act_g$ and $act_h$ - $act_g$ and $act_h$ are the cell input and cell output activation functions
are the cell input and cell output activation functions and `tanh` is usually and `tanh` is usually used for them.
used for them. $\tilde{c_t}$ is also called candidate hidden state, - $\tilde{c_t}$ is also called candidate hidden state,
which is computed based on the current input and the previous hidden state. which is computed based on the current input and the previous hidden state.
Set `use_peepholes` False to disable peephole connection. The formula Set `use_peepholes` False to disable peephole connection. The formula
is omitted here, please refer to the paper is omitted here, please refer to the paper
......
...@@ -139,7 +139,20 @@ class ROIPoolOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -139,7 +139,20 @@ class ROIPoolOpMaker : public framework::OpProtoAndCheckerMaker {
"The pooled output width.") "The pooled output width.")
.SetDefault(1); .SetDefault(1);
AddComment(R"DOC( AddComment(R"DOC(
ROIPool operator **ROIPool Operator**
Region of interest pooling (also known as RoI pooling) is to perform
is to perform max pooling on inputs of nonuniform sizes to obtain
fixed-size feature maps (e.g. 7*7).
The operator has three steps:
1. Dividing each region proposal into equal-sized sections with
the pooled_width and pooled_height
2. Finding the largest value in each section
3. Copying these max values to the output buffer
ROI Pooling for Faster-RCNN. The link below is a further introduction: ROI Pooling for Faster-RCNN. The link below is a further introduction:
https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn https://stackoverflow.com/questions/43430056/what-is-roi-layer-in-fast-rcnn
......
...@@ -41,13 +41,13 @@ class ScaleOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -41,13 +41,13 @@ class ScaleOpMaker : public framework::OpProtoAndCheckerMaker {
AddInput("X", "(Tensor) Input tensor of scale operator."); AddInput("X", "(Tensor) Input tensor of scale operator.");
AddOutput("Out", "(Tensor) Output tensor of scale operator."); AddOutput("Out", "(Tensor) Output tensor of scale operator.");
AddComment(R"DOC( AddComment(R"DOC(
Scale operator **Scale operator**
Multiply the input tensor with a float scalar to scale the input tensor.
$$Out = scale*X$$ $$Out = scale*X$$
)DOC"); )DOC");
AddAttr<float>("scale", AddAttr<float>("scale", "The scaling factor of the scale operator.")
"(float, default 1.0)"
"The scaling factor of the scale operator.")
.SetDefault(1.0); .SetDefault(1.0);
} }
}; };
......
...@@ -109,10 +109,35 @@ class BlockGuardServ(BlockGuard): ...@@ -109,10 +109,35 @@ class BlockGuardServ(BlockGuard):
class ListenAndServ(object): class ListenAndServ(object):
""" """
ListenAndServ class. **ListenAndServ Layer**
ListenAndServ is used to create a rpc server bind and listen
on specific TCP port, this server will run the sub-block when
received variables from clients.
Args:
endpoint(string): IP:port string which the server will listen on.
inputs(list): a list of variables that the server will get from clients.
fan_in(int): how many client are expected to report to this server, default: 1.
optimizer_mode(bool): whether to run the server as a parameter server, default: True.
Examples:
.. code-block:: python
ListenAndServ class is used to wrap listen_and_serv op to create a server with fluid.program_guard(main):
which can receive variables from clients and run a block. serv = layers.ListenAndServ(
"127.0.0.1:6170", ["X"], optimizer_mode=False)
with serv.do():
x = layers.data(
shape=[32, 32],
dtype='float32',
name="X",
append_batch_size=False)
fluid.initializer.Constant(value=1.0)(x, main.global_block())
layers.scale(x=x, scale=10.0, out=out_var)
exe = fluid.Executor(place)
exe.run(main)
""" """
def __init__(self, endpoint, inputs, fan_in=1, optimizer_mode=True): def __init__(self, endpoint, inputs, fan_in=1, optimizer_mode=True):
......
...@@ -49,6 +49,13 @@ _single_dollar_pattern_ = re.compile(r"\$([^\$]+)\$") ...@@ -49,6 +49,13 @@ _single_dollar_pattern_ = re.compile(r"\$([^\$]+)\$")
_two_bang_pattern_ = re.compile(r"!!([^!]+)!!") _two_bang_pattern_ = re.compile(r"!!([^!]+)!!")
def escape_math(text):
return _two_bang_pattern_.sub(
r'$$\1$$',
_single_dollar_pattern_.sub(r':math:`\1`',
_two_dollar_pattern_.sub(r"!!\1!!", text)))
def _generate_doc_string_(op_proto): def _generate_doc_string_(op_proto):
""" """
Generate docstring by OpProto Generate docstring by OpProto
...@@ -60,12 +67,6 @@ def _generate_doc_string_(op_proto): ...@@ -60,12 +67,6 @@ def _generate_doc_string_(op_proto):
str: the document string str: the document string
""" """
def escape_math(text):
return _two_bang_pattern_.sub(
r'$$\1$$',
_single_dollar_pattern_.sub(
r':math:`\1`', _two_dollar_pattern_.sub(r"!!\1!!", text)))
if not isinstance(op_proto, framework_pb2.OpProto): if not isinstance(op_proto, framework_pb2.OpProto):
raise TypeError("OpProto should be `framework_pb2.OpProto`") raise TypeError("OpProto should be `framework_pb2.OpProto`")
...@@ -233,9 +234,6 @@ def autodoc(comment=""): ...@@ -233,9 +234,6 @@ def autodoc(comment=""):
return __impl__ return __impl__
_inline_math_single_dollar = re.compile(r"\$([^\$]+)\$")
def templatedoc(op_type=None): def templatedoc(op_type=None):
""" """
Decorator of layer function. It will use the docstring from the layer Decorator of layer function. It will use the docstring from the layer
...@@ -253,9 +251,6 @@ def templatedoc(op_type=None): ...@@ -253,9 +251,6 @@ def templatedoc(op_type=None):
def trim_ending_dot(msg): def trim_ending_dot(msg):
return msg.rstrip('.') return msg.rstrip('.')
def escape_inline_math(msg):
return _inline_math_single_dollar.sub(repl=r':math:`\1`', string=msg)
def __impl__(func): def __impl__(func):
if op_type is None: if op_type is None:
op_type_name = func.__name__ op_type_name = func.__name__
...@@ -269,7 +264,7 @@ def templatedoc(op_type=None): ...@@ -269,7 +264,7 @@ def templatedoc(op_type=None):
for line in comment_lines: for line in comment_lines:
line = line.strip() line = line.strip()
if len(line) != 0: if len(line) != 0:
comment += escape_inline_math(line) comment += escape_math(line)
comment += " " comment += " "
elif len(comment) != 0: elif len(comment) != 0:
comment += "\n \n " comment += "\n \n "
......
...@@ -267,6 +267,7 @@ def embedding(input, ...@@ -267,6 +267,7 @@ def embedding(input,
return tmp return tmp
@templatedoc(op_type="lstm")
def dynamic_lstm(input, def dynamic_lstm(input,
size, size,
h_0=None, h_0=None,
...@@ -281,56 +282,11 @@ def dynamic_lstm(input, ...@@ -281,56 +282,11 @@ def dynamic_lstm(input,
dtype='float32', dtype='float32',
name=None): name=None):
""" """
**Dynamic LSTM Layer** ${comment}
The defalut implementation is diagonal/peephole connection
(https://arxiv.org/pdf/1402.1128.pdf), the formula is as follows:
.. math::
i_t & = \sigma(W_{ix}x_{t} + W_{ih}h_{t-1} + W_{ic}c_{t-1} + b_i)
f_t & = \sigma(W_{fx}x_{t} + W_{fh}h_{t-1} + W_{fc}c_{t-1} + b_f)
\\tilde{c_t} & = act_g(W_{cx}x_t + W_{ch}h_{t-1} + b_c)
o_t & = \sigma(W_{ox}x_{t} + W_{oh}h_{t-1} + W_{oc}c_t + b_o)
c_t & = f_t \odot c_{t-1} + i_t \odot \\tilde{c_t}
h_t & = o_t \odot act_h(c_t)
where the :math:`W` terms denote weight matrices (e.g. :math:`W_{xi}` is
the matrix of weights from the input gate to the input), :math:`W_{ic}, \
W_{fc}, W_{oc}` are diagonal weight matrices for peephole connections. In
our implementation, we use vectors to reprenset these diagonal weight
matrices. The :math:`b` terms denote bias vectors (:math:`b_i` is the input
gate bias vector), :math:`\sigma` is the non-linear activations, such as
logistic sigmoid function, and :math:`i, f, o` and :math:`c` are the input
gate, forget gate, output gate, and cell activation vectors, respectively,
all of which have the same size as the cell output activation vector :math:`h`.
The :math:`\odot` is the element-wise product of the vectors. :math:`act_g`
and :math:`act_h` are the cell input and cell output activation functions
and `tanh` is usually used for them. :math:`\\tilde{c_t}` is also called
candidate hidden state, which is computed based on the current input and
the previous hidden state.
Set `use_peepholes` to `False` to disable peephole connection. The formula
is omitted here, please refer to the paper
http://www.bioinf.jku.at/publications/older/2604.pdf for details.
Note that these :math:`W_{xi}x_{t}, W_{xf}x_{t}, W_{xc}x_{t}, W_{xo}x_{t}`
operations on the input :math:`x_{t}` are NOT included in this operator.
Users can choose to use fully-connect layer before LSTM layer.
Args: Args:
input(Variable): The input of dynamic_lstm layer, which supports input (Variable): ${input_comment}
variable-time length input sequence. The underlying size (int): 4 * hidden size.
tensor in this Variable is a matrix with shape
(T X 4D), where T is the total time steps in this
mini-batch, D is the hidden size.
size(int): 4 * hidden size.
h_0(Variable): The initial hidden state is an optional input, default is zero. h_0(Variable): The initial hidden state is an optional input, default is zero.
This is a tensor with shape (N x D), where N is the This is a tensor with shape (N x D), where N is the
batch size and D is the hidden size. batch size and D is the hidden size.
...@@ -345,32 +301,26 @@ def dynamic_lstm(input, ...@@ -345,32 +301,26 @@ def dynamic_lstm(input,
W_{fh}, W_{oh}`} W_{fh}, W_{oh}`}
- The shape is (D x 4D), where D is the hidden - The shape is (D x 4D), where D is the hidden
size. size.
bias_attr(ParamAttr|None): The bias attribute for the learnable bias bias_attr (ParamAttr|None): The bias attribute for the learnable bias
weights, which contains two parts, input-hidden weights, which contains two parts, input-hidden
bias weights and peephole connections weights if bias weights and peephole connections weights if
setting `use_peepholes` to `True`. setting `use_peepholes` to `True`.
1. `use_peepholes = False` 1. `use_peepholes = False`
- Biases = {:math:`b_c, b_i, b_f, b_o`}. - Biases = {:math:`b_c, b_i, b_f, b_o`}.
- The shape is (1 x 4D). - The shape is (1 x 4D).
2. `use_peepholes = True` 2. `use_peepholes = True`
- Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \ - Biases = { :math:`b_c, b_i, b_f, b_o, W_{ic}, \
W_{fc}, W_{oc}`}. W_{fc}, W_{oc}`}.
- The shape is (1 x 7D). - The shape is (1 x 7D).
use_peepholes(bool): Whether to enable diagonal/peephole connections, use_peepholes (bool): ${use_peepholes_comment}
default `True`. is_reverse (bool): ${is_reverse_comment}
is_reverse(bool): Whether to compute reversed LSTM, default `False`. gate_activation (str): ${gate_activation_comment}
gate_activation(str): The activation for input gate, forget gate and cell_activation (str): ${cell_activation_comment}
output gate. Choices = ["sigmoid", "tanh", "relu", candidate_activation (str): ${candidate_activation_comment}
"identity"], default "sigmoid". dtype (str): Data type. Choices = ["float32", "float64"], default "float32".
cell_activation(str): The activation for cell output. Choices = ["sigmoid", name (str|None): A name for this layer(optional). If set None, the layer
"tanh", "relu", "identity"], default "tanh". will be named automatically.
candidate_activation(str): The activation for candidate hidden state.
Choices = ["sigmoid", "tanh", "relu", "identity"],
default "tanh".
dtype(str): Data type. Choices = ["float32", "float64"], default "float32".
name(str|None): A name for this layer(optional). If set None, the layer
will be named automatically.
Returns: Returns:
tuple: The hidden state, and cell state of LSTM. The shape of both \ tuple: The hidden state, and cell state of LSTM. The shape of both \
...@@ -889,11 +839,19 @@ def crf_decoding(input, param_attr, label=None): ...@@ -889,11 +839,19 @@ def crf_decoding(input, param_attr, label=None):
Args: Args:
input(${emission_type}): ${emission_comment} input(${emission_type}): ${emission_comment}
param_attr(ParamAttr): The parameter attribute for training. param_attr(ParamAttr): The parameter attribute for training.
label(${label_type}): ${label_comment} label(${label_type}): ${label_comment}
Returns: Returns:
${viterbi_path_comment} Variable: ${viterbi_path_comment}
Examples:
.. code-block:: python
crf_decode = layers.crf_decoding(
input=hidden, param_attr=ParamAttr(name="crfw"))
""" """
helper = LayerHelper('crf_decoding', **locals()) helper = LayerHelper('crf_decoding', **locals())
transition = helper.get_parameter(param_attr.name) transition = helper.get_parameter(param_attr.name)
...@@ -908,14 +866,14 @@ def crf_decoding(input, param_attr, label=None): ...@@ -908,14 +866,14 @@ def crf_decoding(input, param_attr, label=None):
return viterbi_path return viterbi_path
@templatedoc()
def cos_sim(X, Y): def cos_sim(X, Y):
""" """
This function performs the cosine similarity between two tensors ${comment}
X and Y and returns that as the output.
Args: Args:
X (Variable): The input X. X (Variable): ${x_comment}.
Y (Variable): The input Y. Y (Variable): ${y_comment}.
Returns: Returns:
Variable: the output of cosine(X, Y). Variable: the output of cosine(X, Y).
...@@ -1113,9 +1071,70 @@ def chunk_eval(input, ...@@ -1113,9 +1071,70 @@ def chunk_eval(input,
num_chunk_types, num_chunk_types,
excluded_chunk_types=None): excluded_chunk_types=None):
""" """
**Chunk Evaluator**
This function computes and outputs the precision, recall and This function computes and outputs the precision, recall and
F1-score of chunk detection. F1-score of chunk detection.
For some basics of chunking, please refer to
'Chunking with Support Vector Machines <https://aclanthology.info/pdf/N/N01/N01-1025.pdf>'.
ChunkEvalOp computes the precision, recall, and F1-score of chunk detection,
and supports IOB, IOE, IOBES and IO (also known as plain) tagging schemes.
Here is a NER example of labeling for these tagging schemes:
.. code-block:: python
====== ====== ====== ===== == ============ ===== ===== ===== == =========
Li Ming works at Agricultural Bank of China in Beijing.
====== ====== ====== ===== == ============ ===== ===== ===== == =========
IO I-PER I-PER O O I-ORG I-ORG I-ORG I-ORG O I-LOC
IOB B-PER I-PER O O B-ORG I-ORG I-ORG I-ORG O B-LOC
IOE I-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O E-LOC
IOBES B-PER E-PER O O I-ORG I-ORG I-ORG E-ORG O S-LOC
====== ====== ====== ===== == ============ ===== ===== ===== == =========
There are three chunk types(named entity types) including PER(person), ORG(organization)
and LOC(LOCATION), and we can see that the labels have the form <tag type>-<chunk type>.
Since the calculations actually use label ids rather than labels, extra attention
should be paid when mapping labels to ids to make CheckEvalOp work. The key point
is that the listed equations are satisfied by ids.
.. code-block:: python
tag_type = label % num_tag_type
chunk_type = label / num_tag_type
where `num_tag_type` is the num of tag types in the tagging scheme, `num_chunk_type`
is the num of chunk types, and `tag_type` get its value from the following table.
.. code-block:: python
Scheme Begin Inside End Single
plain 0 - - -
IOB 0 1 - -
IOE - 0 1 -
IOBES 0 1 2 3
Still use NER as example, assuming the tagging scheme is IOB while chunk types are ORG,
PER and LOC. To satisfy the above equations, the label map can be like this:
.. code-block:: python
B-ORG 0
I-ORG 1
B-PER 2
I-PER 3
B-LOC 4
I-LOC 5
O 6
It's not hard to verify the equations noting that the num of chunk types
is 3 and the num of tag types in IOB scheme is 2. For example, the label
id of I-LOC is 5, the tag type id of I-LOC is 1, and the chunk type id of
I-LOC is 2, which consistent with the results from the equations.
Args: Args:
input (Variable): prediction output of the network. input (Variable): prediction output of the network.
label (Variable): label of the test data set. label (Variable): label of the test data set.
...@@ -1124,9 +1143,22 @@ def chunk_eval(input, ...@@ -1124,9 +1143,22 @@ def chunk_eval(input,
excluded_chunk_types (list): ${excluded_chunk_types_comment} excluded_chunk_types (list): ${excluded_chunk_types_comment}
Returns: Returns:
tuple: tuple containing: (precision, recall, f1_score, tuple: tuple containing: precision, recall, f1_score,
num_infer_chunks, num_label_chunks, num_infer_chunks, num_label_chunks,
num_correct_chunks) num_correct_chunks
Examples:
.. code-block:: python
crf = fluid.layers.linear_chain_crf(
input=hidden, label=label, param_attr=ParamAttr(name="crfw"))
crf_decode = fluid.layers.crf_decoding(
input=hidden, param_attr=ParamAttr(name="crfw"))
fluid.layers.chunk_eval(
input=crf_decode,
label=label,
chunk_scheme="IOB",
num_chunk_types=(label_dict_len - 1) / 2)
""" """
helper = LayerHelper("chunk_eval", **locals()) helper = LayerHelper("chunk_eval", **locals())
...@@ -3390,6 +3422,7 @@ def edit_distance(input, label, normalized=True, ignored_tokens=None): ...@@ -3390,6 +3422,7 @@ def edit_distance(input, label, normalized=True, ignored_tokens=None):
def ctc_greedy_decoder(input, blank, name=None): def ctc_greedy_decoder(input, blank, name=None):
""" """
This op is used to decode sequences by greedy policy by below steps: This op is used to decode sequences by greedy policy by below steps:
1. Get the indexes of max value for each row in input. a.k.a. 1. Get the indexes of max value for each row in input. a.k.a.
numpy.argmax(input, axis=0). numpy.argmax(input, axis=0).
2. For each sequence in result of step1, merge repeated tokens between two 2. For each sequence in result of step1, merge repeated tokens between two
...@@ -3673,8 +3706,6 @@ def nce(input, ...@@ -3673,8 +3706,6 @@ def nce(input,
def transpose(x, perm, name=None): def transpose(x, perm, name=None):
""" """
**transpose Layer**
Permute the dimensions of `input` according to `perm`. Permute the dimensions of `input` according to `perm`.
The `i`-th dimension of the returned tensor will correspond to the The `i`-th dimension of the returned tensor will correspond to the
...@@ -4059,8 +4090,9 @@ def one_hot(input, depth): ...@@ -4059,8 +4090,9 @@ def one_hot(input, depth):
def autoincreased_step_counter(counter_name=None, begin=1, step=1): def autoincreased_step_counter(counter_name=None, begin=1, step=1):
""" """
NOTE: The counter will be automatically increased by 1 every mini-batch Create an auto-increase variable
Return the run counter of the main program, which is started with 1. which will be automatically increased by 1 every mini-batch
Return the run counter of the main program, default is started from 1.
Args: Args:
counter_name(str): The counter name, default is '@STEP_COUNTER@'. counter_name(str): The counter name, default is '@STEP_COUNTER@'.
...@@ -4069,6 +4101,12 @@ def autoincreased_step_counter(counter_name=None, begin=1, step=1): ...@@ -4069,6 +4101,12 @@ def autoincreased_step_counter(counter_name=None, begin=1, step=1):
Returns: Returns:
Variable: The global run counter. Variable: The global run counter.
Examples:
.. code-block:: python
global_step = fluid.layers.autoincreased_step_counter(
counter_name='@LR_DECAY_COUNTER@', begin=begin, step=1)
""" """
helper = LayerHelper('global_step_counter') helper = LayerHelper('global_step_counter')
if counter_name is None: if counter_name is None:
...@@ -4476,34 +4514,20 @@ def label_smooth(label, ...@@ -4476,34 +4514,20 @@ def label_smooth(label,
return smooth_label return smooth_label
@templatedoc()
def roi_pool(input, rois, pooled_height=1, pooled_width=1, spatial_scale=1.0): def roi_pool(input, rois, pooled_height=1, pooled_width=1, spatial_scale=1.0):
""" """
Region of interest pooling (also known as RoI pooling) is to perform ${comment}
is to perform max pooling on inputs of nonuniform sizes to obtain
fixed-size feature maps (e.g. 7*7).
The operator has three steps:
1. Dividing each region proposal into equal-sized sections with
the pooled_width and pooled_height
2. Finding the largest value in each section
3. Copying these max values to the output buffer
Args: Args:
input (Variable): The input for ROI pooling. input (Variable): ${x_comment}
rois (Variable): ROIs (Regions of Interest) to pool over. It should rois (Variable): ROIs (Regions of Interest) to pool over.
be a 2-D one level LoTensor of shape [num_rois, 4]. pooled_height (integer): ${pooled_height_comment} Default: 1
The layout is [x1, y1, x2, y2], where (x1, y1) pooled_width (integer): ${pooled_width_comment} Default: 1
is the top left coordinates, and (x2, y2) is the spatial_scale (float): ${spatial_scale_comment} Default: 1.0
bottom right coordinates. The num_rois is the
total number of ROIs in this batch data.
pooled_height (integer): The pooled output height. Default: 1
pooled_width (integer): The pooled output width. Default: 1
spatial_scale (float): Multiplicative spatial scale factor. To
translate ROI coords from their input scale
to the scale used when pooling. Default: 1.0
Returns: Returns:
pool_out (Variable): The output is a 4-D tensor of the shape Variable: ${out_comment}.
(num_rois, channels, pooled_h, pooled_w).
Examples: Examples:
.. code-block:: python .. code-block:: python
......
...@@ -68,6 +68,7 @@ __all__ = [ ...@@ -68,6 +68,7 @@ __all__ = [
'slice', 'slice',
'polygon_box_transform', 'polygon_box_transform',
'shape', 'shape',
'iou_similarity',
'maxout', 'maxout',
] + __activations__ ] + __activations__
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册