Skip to content
体验新版
项目
组织
正在加载...
登录
切换导航
打开侧边栏
PaddlePaddle
Paddle
提交
8ba62a5f
P
Paddle
项目概览
PaddlePaddle
/
Paddle
大约 1 年 前同步成功
通知
2299
Star
20931
Fork
5422
代码
文件
提交
分支
Tags
贡献者
分支图
Diff
Issue
1423
列表
看板
标记
里程碑
合并请求
543
Wiki
0
Wiki
分析
仓库
DevOps
项目成员
Pages
P
Paddle
项目概览
项目概览
详情
发布
仓库
仓库
文件
提交
分支
标签
贡献者
分支图
比较
Issue
1,423
Issue
1,423
列表
看板
标记
里程碑
合并请求
543
合并请求
543
Pages
分析
分析
仓库分析
DevOps
Wiki
0
Wiki
成员
成员
收起侧边栏
关闭侧边栏
动态
分支图
创建新Issue
提交
Issue看板
提交
8ba62a5f
编写于
11月 23, 2017
作者:
C
caoying03
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
fix LaTeX syntax in liear_chain_crf op.
上级
e800c0d3
变更
3
隐藏空白更改
内联
并排
Showing
3 changed file
with
28 addition
and
27 deletion
+28
-27
paddle/operators/linear_chain_crf_op.cc
paddle/operators/linear_chain_crf_op.cc
+23
-22
paddle/operators/softmax_op.cc
paddle/operators/softmax_op.cc
+1
-1
paddle/operators/softmax_with_cross_entropy_op.cc
paddle/operators/softmax_with_cross_entropy_op.cc
+4
-4
未找到文件。
paddle/operators/linear_chain_crf_op.cc
浏览文件 @
8ba62a5f
...
@@ -32,19 +32,19 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
...
@@ -32,19 +32,19 @@ class LinearChainCRFOpMaker : public framework::OpProtoAndCheckerMaker {
"[(D + 2) x D]. The learnable parameter for the linear_chain_crf "
"[(D + 2) x D]. The learnable parameter for the linear_chain_crf "
"operator. See more details in the operator's comments."
);
"operator. See more details in the operator's comments."
);
AddInput
(
"Label"
,
AddInput
(
"Label"
,
"(LoDTensor, default LoDTensor<int>) A LoDTensor with shape "
"(LoDTensor, default LoDTensor<int
64_t
>) A LoDTensor with shape "
"[N x 1], where N is the total element number in a mini-batch. "
"[N x 1], where N is the total element number in a mini-batch. "
"The ground truth."
);
"The ground truth."
);
AddOutput
(
AddOutput
(
"Alpha"
,
"Alpha"
,
"(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"(Tensor, default Tensor<float>) A 2-D Tensor with shape [N x D]. "
"The forward vectors for the entire batch. Denote it as
\f
$
\a
lpha
\f
$. "
"The forward vectors for the entire batch. Denote it as
$
\a
lpha
$. "
"
\f
$
\a
lpha$
\f
is a memo table used to calculate the normalization "
"
$
\a
lpha$
is a memo table used to calculate the normalization "
"factor in CRF.
\f
$
\a
lpha[k, v]$
\f
stores the unnormalized "
"factor in CRF.
$
\a
lpha[k, v]$
stores the unnormalized "
"probabilites of all possible unfinished sequences of tags that end at "
"probabilites of all possible unfinished sequences of tags that end at "
"position
\f
$k$
\f
with tag
\f
$v$
\f
. For each
\f
$k$
\f
, "
"position
$k$ with tag $v$. For each $k$
, "
"
\f
$
\a
lpha[k, v]$
\f
is a vector of length
\f
$D$
\f
with a component for "
"
$
\a
lpha[k, v]$ is a vector of length $D$
with a component for "
"each tag value
\f
$v$
\f
. This vector is called a forward vecotr and "
"each tag value
$v$
. This vector is called a forward vecotr and "
"will also be used in backward computations."
)
"will also be used in backward computations."
)
.
AsIntermediate
();
.
AsIntermediate
();
AddOutput
(
AddOutput
(
...
@@ -73,9 +73,9 @@ LinearChainCRF Operator.
...
@@ -73,9 +73,9 @@ LinearChainCRF Operator.
Conditional Random Field defines an undirected probabilistic graph with nodes
Conditional Random Field defines an undirected probabilistic graph with nodes
denoting random variables and edges denoting dependencies between these
denoting random variables and edges denoting dependencies between these
variables. CRF learns the conditional probability
\f$P(Y|X)\f
$, where
variables. CRF learns the conditional probability
$P(Y|X)
$, where
\f$X = (x_1, x_2, ... , x_n)\f
$ are structured inputs and
$X = (x_1, x_2, ... , x_n)
$ are structured inputs and
\f$Y = (y_1, y_2, ... , y_n)\f
$ are labels for the inputs.
$Y = (y_1, y_2, ... , y_n)
$ are labels for the inputs.
Linear chain CRF is a special case of CRF that is useful for sequence labeling
Linear chain CRF is a special case of CRF that is useful for sequence labeling
task. Sequence labeling tasks do not assume a lot of conditional
task. Sequence labeling tasks do not assume a lot of conditional
...
@@ -88,21 +88,22 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
...
@@ -88,21 +88,22 @@ CRF. Please refer to http://www.cs.columbia.edu/~mcollins/fb.pdf and
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
http://cseweb.ucsd.edu/~elkan/250Bwinter2012/loglinearCRFs.pdf for details.
Equation:
Equation:
1. Denote Input(Emission) to this operator as
\f$x\f
$ here.
1. Denote Input(Emission) to this operator as
$x
$ here.
2. The first D values of Input(Transition) to this operator are for starting
2. The first D values of Input(Transition) to this operator are for starting
weights, denoted as
\f$a\f
$ here.
weights, denoted as
$a
$ here.
3. The next D values of Input(Transition) of this operator are for ending
3. The next D values of Input(Transition) of this operator are for ending
weights, denoted as
\f$b\f
$ here.
weights, denoted as
$b
$ here.
4. The remaning values of Input(Transition) are for transition weights,
4. The remaning values of Input(Transition) are for transition weights,
denoted as \f$w\f$ here.
denoted as $w$ here.
5. Denote Input(Label) as \f$s\f$ here.
5. Denote Input(Label) as $s$ here.
The probability of a sequence \f$s\f$ of length \f$L\f$ is defined as:
The probability of a sequence $s$ of length $L$ is defined as:
\f$P(s) = (1/Z) \exp(a_{s_1} + b_{s_L}
$$P(s) = (1/Z) \exp(a_{s_1} + b_{s_L}
+ \sum_{l=1}^L x_{s_l}
+ \sum_{l=1}^L x_{s_l}
+ \sum_{l=2}^L w_{s_{l-1},s_l})\f$
+ \sum_{l=2}^L w_{s_{l-1},s_l})$$
where \f$Z\f$ is a normalization value so that the sum of \f$P(s)\f$ over
all possible sequences is \f$1\f$, and \f$x\f$ is the emission feature weight
where $Z$ is a normalization value so that the sum of $P(s)$ over
all possible sequences is 1, and $x$ is the emission feature weight
to the linear chain CRF.
to the linear chain CRF.
Finally, the linear chain CRF operator outputs the logarithm of the conditional
Finally, the linear chain CRF operator outputs the logarithm of the conditional
...
...
paddle/operators/softmax_op.cc
浏览文件 @
8ba62a5f
...
@@ -59,7 +59,7 @@ Then the ratio of the exponential of the given dimension and the sum of
...
@@ -59,7 +59,7 @@ Then the ratio of the exponential of the given dimension and the sum of
exponential values of all the other dimensions is the output of the softmax
exponential values of all the other dimensions is the output of the softmax
operator.
operator.
For each row
`i` and each column `j` in input X
, we have:
For each row
$i$ and each column $j$ in Input(X)
, we have:
$$Y[i, j] = \frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])}$$
$$Y[i, j] = \frac{\exp(X[i, j])}{\sum_j(exp(X[i, j])}$$
)DOC"
);
)DOC"
);
...
...
paddle/operators/softmax_with_cross_entropy_op.cc
浏览文件 @
8ba62a5f
...
@@ -67,15 +67,15 @@ The equation is as follows:
...
@@ -67,15 +67,15 @@ The equation is as follows:
1) Hard label (one-hot label, so every sample has exactly one class)
1) Hard label (one-hot label, so every sample has exactly one class)
$$Loss_j =
\f$
-\text{Logit}_{Label_j} +
$$Loss_j = -\text{Logit}_{Label_j} +
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right),
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right),
j = 1,
..., K $\f
$$
j = 1,
..., K
$$
2) Soft label (each sample can have a distribution over all classes)
2) Soft label (each sample can have a distribution over all classes)
$$Loss_j =
\f$ -\sum_{i=0}^{K}\text{Label}_i
\left(\text{Logit}_i -
$$Loss_j =
-\sum_{i=0}^{K}\text{Label}_i
\left(\text{Logit}_i -
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right),
\log\left(\sum_{i=0}^{K}\exp(\text{Logit}_i)\right)\right),
j = 1,...,K
$\f
$$
j = 1,...,K$$
)DOC"
);
)DOC"
);
}
}
...
...
编辑
预览
Markdown
is supported
0%
请重试
或
添加新附件
.
添加附件
取消
You are about to add
0
people
to the discussion. Proceed with caution.
先完成此消息的编辑!
取消
想要评论请
注册
或
登录