refine crf tutorial.

9429e5f8 · caoying03 · 58a6d233 · 9429e5f8 · 9429e5f8 · 9429e5f8
4 changed file
--- a/06.label_semantic_roles/README.en.md
+++ b/06.label_semantic_roles/README.en.md
@@ -343,23 +343,23 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```

- We will concatenate the output of the top LSTM unit with its input, and project the result into a hidden layer. Then, we put a fully connected layer on top to get the final feature vector representation.
-
- ```python
- feature_out = paddle.layer.mixed(
- size=label_dict_len,
- bias_attr=std_default,
- input=[
-     paddle.layer.full_matrix_projection(
-         input=input_tmp[0], param_attr=hidden_para_attr),
-     paddle.layer.full_matrix_projection(
-         input=input_tmp[1], param_attr=lstm_para_attr)
- ], )
- ```
-
- At the end of the network, we use CRF as the cost function; the parameter of CRF cost will be named `crfw`.
+- In PaddlePaddle, state features and transition features of a CRF are implemented by a fully connected layer and a CRF layer seperately. The fully connected layer with linear activation learns the state features, here we use paddle.layer.mixed (paddle.layer.fc can be uesed as well), and the CRF layer in PaddlePaddle: paddle.layer.crf only learns the transition features, which is a cost layer and is the last layer of the network. paddle.layer.crf outputs the log probability of true tag sequence as the cost by given the input sequence and it requires the true tag sequence as target in the learning process.

 ```python
+# the fully connected layer learns the state features
+# The output of the top LSTM unit and its input are concatenated
+# and then is feed into a fully connected layer,
+# size of which equals to size of tag labels.
+
+feature_out = paddle.layer.mixed(
+    size=label_dict_len,
+    bias_attr=std_default,
+    input=[
+        paddle.layer.full_matrix_projection(
+            input=input_tmp[0], param_attr=hidden_para_attr),
+        paddle.layer.full_matrix_projection(
+            input=input_tmp[1], param_attr=lstm_para_attr)], )
+
 crf_cost = paddle.layer.crf(
    size=label_dict_len,
    input=feature_out,

--- a/06.label_semantic_roles/README.md
+++ b/06.label_semantic_roles/README.md
@@ -320,9 +320,11 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```

- 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
+- 在PaddlePaddle中，CRF的状态特征和转移特征分别由一个全连接层和一个PaddlePaddle中的CRF层分别学习。在这个例子中，我们用线性激活的paddle.layer.mixed 来学习CRF的状态特征（也可以使用paddle.layer.fc），而 paddle.layer.crf只学习转移特征。paddle.layer.crf层是一个 cost 层，处于整个网络的末端，输出给定输入序列下，标记序列的log probability作为代价。训练阶段，该层需要输入正确的标记序列作为学习目标。

 ```python
+
+# 学习 CRF 的状态特征
 feature_out = paddle.layer.mixed(
 size=label_dict_len,
 bias_attr=std_default,
@@ -332,11 +334,8 @@ input=[
    paddle.layer.full_matrix_projection(
        input=input_tmp[1], param_attr=lstm_para_attr)
 ], )
-```

- 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
-
-```python
+# 学习 CRF 的转移特征
 crf_cost = paddle.layer.crf(
    size=label_dict_len,
    input=feature_out,
@@ -347,7 +346,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```

- CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
+- CRF解码和CRF层参数名字相同，即：加载了paddle.layer.crf层学习到的参数。在训练阶段，为 paddle.layer.crf_decoding 输入了正确的标记序列(target)，这一层会输出是否正确标记，evaluator.sum 用来计算序列上的标记错误率，可以用来评估模型。解码阶段，没有输入正确的数据标签，该层通过寻找概率最高的标记序列，解码出标记结果。

 ```python
 crf_dec = paddle.layer.crf_decoding(

--- a/06.label_semantic_roles/index.en.html
+++ b/06.label_semantic_roles/index.en.html
@@ -385,23 +385,23 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```

- We will concatenate the output of the top LSTM unit with its input, and project the result into a hidden layer. Then, we put a fully connected layer on top to get the final feature vector representation.
-
- ```python
- feature_out = paddle.layer.mixed(
- size=label_dict_len,
- bias_attr=std_default,
- input=[
-     paddle.layer.full_matrix_projection(
-         input=input_tmp[0], param_attr=hidden_para_attr),
-     paddle.layer.full_matrix_projection(
-         input=input_tmp[1], param_attr=lstm_para_attr)
- ], )
- ```
-
- At the end of the network, we use CRF as the cost function; the parameter of CRF cost will be named `crfw`.
+- In PaddlePaddle, state features and transition features of a CRF are implemented by a fully connected layer and a CRF layer seperately. The fully connected layer with linear activation learns the state features, here we use paddle.layer.mixed (paddle.layer.fc can be uesed as well), and the CRF layer in PaddlePaddle: paddle.layer.crf only learns the transition features, which is a cost layer and is the last layer of the network. paddle.layer.crf outputs the log probability of true tag sequence as the cost by given the input sequence and it requires the true tag sequence as target in the learning process.

 ```python
+# the fully connected layer learns the state features
+# The output of the top LSTM unit and its input are concatenated
+# and then is feed into a fully connected layer,
+# size of which equals to size of tag labels.
+
+feature_out = paddle.layer.mixed(
+    size=label_dict_len,
+    bias_attr=std_default,
+    input=[
+        paddle.layer.full_matrix_projection(
+            input=input_tmp[0], param_attr=hidden_para_attr),
+        paddle.layer.full_matrix_projection(
+            input=input_tmp[1], param_attr=lstm_para_attr)], )
+
 crf_cost = paddle.layer.crf(
    size=label_dict_len,
    input=feature_out,

--- a/06.label_semantic_roles/index.html
+++ b/06.label_semantic_roles/index.html
@@ -362,9 +362,11 @@ for i in range(1, depth):
    input_tmp = [mix_hidden, lstm]
 ```

- 取最后一个栈式LSTM的输出和这个LSTM单元的输入到隐层映射，经过一个全连接层映射到标记字典的维度，得到最终的特征向量表示。
+- 在PaddlePaddle中，CRF的状态特征和转移特征分别由一个全连接层和一个PaddlePaddle中的CRF层分别学习。在这个例子中，我们用线性激活的paddle.layer.mixed 来学习CRF的状态特征（也可以使用paddle.layer.fc），而 paddle.layer.crf只学习转移特征。paddle.layer.crf层是一个 cost 层，处于整个网络的末端，输出给定输入序列下，标记序列的log probability作为代价。训练阶段，该层需要输入正确的标记序列作为学习目标。

 ```python
+
+# 学习 CRF 的状态特征
 feature_out = paddle.layer.mixed(
 size=label_dict_len,
 bias_attr=std_default,
@@ -374,11 +376,8 @@ input=[
    paddle.layer.full_matrix_projection(
        input=input_tmp[1], param_attr=lstm_para_attr)
 ], )
-```

- 网络的末端定义CRF层计算损失(cost)，指定参数名字为 `crfw`，该层需要输入正确的数据标签(target)。
-
-```python
+# 学习 CRF 的转移特征
 crf_cost = paddle.layer.crf(
    size=label_dict_len,
    input=feature_out,
@@ -389,7 +388,7 @@ crf_cost = paddle.layer.crf(
        learning_rate=mix_hidden_lr))
 ```

- CRF译码层和CRF层参数名字相同，即共享权重。如果输入了正确的数据标签(target)，会统计错误标签的个数，可以用来评估模型。如果没有输入正确的数据标签，该层可以推到出最优解，可以用来预测模型。
+- CRF解码和CRF层参数名字相同，即：加载了paddle.layer.crf层学习到的参数。在训练阶段，为 paddle.layer.crf_decoding 输入了正确的标记序列(target)，这一层会输出是否正确标记，evaluator.sum 用来计算序列上的标记错误率，可以用来评估模型。解码阶段，没有输入正确的数据标签，该层通过寻找概率最高的标记序列，解码出标记结果。

 ```python
 crf_dec = paddle.layer.crf_decoding(