未验证 提交 2c471db2 编写于 作者: C Cao Ying 提交者: GitHub

Merge pull request #5884 from lcy-seso/fix_latex

fix LaTeX syntax in three operators' comments.
...@@ -32,19 +32,19 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker { ...@@ -32,19 +32,19 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
"[(D + 2) x D]. The learnable parameter for the linear_chain_crf " "[(D + 2) x D]. The learnable parameter for the linear_chain_crf "
"operator. See more details in the operator's comments."); "operator. See more details in the operator's comments.");
AddInput("Label", AddInput("Label",
"(LoDTensor, default LoDTensor<int>) A LoDTensor with shape " "(LoDTensor, default LoDTensor<int64_t>) A LoDTensor with shape "
"[N x 1], where N is the total element number in a mini-batch. " "[N x 1], where N is the total element number in a mini-batch. "
"The ground truth."); "The ground truth.");
AddOutput( AddOutput(
"Alpha", "Alpha",
"(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. " "(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"The forward vectors for the entire batch. Denote it as \f$\alpha\f$. " "The forward vectors for the entire batch. Denote it as $\alpha$. "
"\f$\alpha$\f is a memo table used to calculate the normalization " "$\alpha$ is a memo table used to calculate the normalization "
"factor in CRF. \f$\alpha[k, v]$\f stores the unnormalized " "factor in CRF. $\alpha[k, v]$ stores the unnormalized "
"probabilites of all possible unfinished sequences of tags that end at " "probabilites of all possible unfinished sequences of tags that end at "
"position \f$k$\f with tag \f$v$\f. For each \f$k$\f, " "position $k$ with tag $v$. For each $k$, "
"\f$\alpha[k, v]$\f is a vector of length \f$D$\f with a component for " "$\alpha[k, v]$ is a vector of length $D$ with a component for "
"each tag value \f$v$\f. This vector is called a forward vecotr and " "each tag value $v$. This vector is called a forward vecotr and "
"will also be used in backward computations.") "will also be used in backward computations.")
.AsIntermediate(); .AsIntermediate();
AddOutput( AddOutput(
...@@ -73,9 +73,9 @@ LinearChainCRF Operator. ...@@ -73,9 +73,9 @@ LinearChainCRF Operator.
Conditional Random Field defines an undirected probabilistic graph with nodes Conditional Random Field defines an undirected probabilistic graph with nodes
denoting random variables and edges denoting dependencies between these denoting random variables and edges denoting dependencies between these
variables. CRF learns the conditional probability \f$P(Y|X)\f$, where variables. CRF learns the conditional probability $P(Y|X)$, where
\f$X = (x_1, x_2, ... , x_n)\f$ are structured inputs and $X = (x_1, x_2, ... , x_n)$ are structured inputs and
\f$Y = (y_1, y_2, ... , y_n)\f$ are labels for the inputs. $Y = (y_1, y_2, ... , y_n)$ are labels for the inputs.
Linear chain CRF is a special case of CRF that is useful for sequence labeling Linear chain CRF is a special case of CRF that is useful for sequence labeling
task. Sequence labeling tasks do not assume a lot of conditional task. Sequence labeling tasks do not assume a lot of conditional
...@@ -88,21 +88,22 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and ...@@ -88,21 +88,22 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details. http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
Equation: Equation:
1. Denote Input(Emission) to this operator as \f$x\f$ here. 1. Denote Input(Emission) to this operator as $x$ here.
2. The first D values of Input(Transition) to this operator are for starting 2. The first D values of Input(Transition) to this operator are for starting
weights, denoted as \f$a\f$ here. weights, denoted as $a$ here.
3. The next D values of Input(Transition) of this operator are for ending 3. The next D values of Input(Transition) of this operator are for ending
weights, denoted as \f$b\f$ here. weights, denoted as $b$ here.
4. The remaning values of Input(Transition) are for transition weights, 4. The remaning values of Input(Transition) are for transition weights,
denoted as \f$w\f$ here. denoted as $w$ here.
5. Denote Input(Label) as \f$s\f$ here. 5. Denote Input(Label) as $s$ here.
The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as: The probability of a sequence $s$ of length $L$ is defined as:
\f$P(s) = (1/Z) \exp(a_{s_1} + b_{s_L} $$P(s) = (1/Z) \exp(a_{s_1} + b_{s_L}
+ \sum_{l=1}^L x_{s_l} + \sum_{l=1}^L x_{s_l}
+ \sum_{l=2}^L w_{s_{l-1},s_l})\f$ + \sum_{l=2}^L w_{s_{l-1},s_l})$$
where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over
all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight where $Z$ is a normalization value so that the sum of $P(s)$ over
all possible sequences is 1, and $x$ is the emission feature weight
to the linear chain CRF. to the linear chain CRF.
Finally, the linear chain CRF operator outputs the logarithm of the conditional Finally, the linear chain CRF operator outputs the logarithm of the conditional
......
...@@ -59,7 +59,7 @@ Then the ratio of the exponential of the given dimension and the sum of ...@@ -59,7 +59,7 @@ Then the ratio of the exponential of the given dimension and the sum of
exponential values of all the other dimensions is the output of the softmax exponential values of all the other dimensions is the output of the softmax
operator. operator.
For each row `i` and each column `j` in input X, we have: For each row $i$ and each column $j$ in Input(X), we have:
$$Y[i, j] = \frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])}$$ $$Y[i, j] = \frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])}$$
)DOC"); )DOC");
......
...@@ -67,15 +67,15 @@ The equation is as follows: ...@@ -67,15 +67,15 @@ The equation is as follows:
1) Hard label (one-hot label, so every sample has exactly one class) 1) Hard label (one-hot label, so every sample has exactly one class)
$$Loss_j = \f$ -\text{Logit}_{Label_j} + $$Loss_j = -\text{Logit}_{Label_j} +
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right), \log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right),
j = 1, ..., K $\f$$ j = 1,..., K$$
2) Soft label (each sample can have a distribution over all classes) 2) Soft label (each sample can have a distribution over all classes)
$$Loss_j = \f$ -\sum_{i=0}^{K}\text{Label}_i\left(\text{Logit}_i - $$Loss_j = -\sum_{i=0}^{K}\text{Label}_i \left(\text{Logit}_i -
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right), \log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right),
j = 1,...,K $\f$$ j = 1,...,K$$
)DOC"); )DOC");
} }
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册