提交 ded305e9 编写于 作者: L livc

Merge branch 'develop' of https://github.com/PaddlePaddle/book into fix_English_version_error

#!/bin/bash
cur_path="$(cd "$(dirname "$0")" && pwd -P)"
cd $cur_path/../
#convert md to ipynb
.tools/convert-markdown-into-ipynb-and-test.sh
paddle_version=0.10.0rc2
#generate docker file
if [ ${USE_UBUNTU_REPO_MIRROR} ]; then
UPDATE_MIRROR_CMD="sed 's@http:\/\/archive.ubuntu.com\/ubuntu\/@mirror:\/\/mirrors.ubuntu.com\/mirrors.txt@' -i /etc/apt/sources.list && \\"
else
UPDATE_MIRROR_CMD="\\"
fi
mkdir -p build
cat > build/Dockerfile <<EOF
FROM paddlepaddle/paddle:${paddle_version}
MAINTAINER PaddlePaddle Authors <paddle-dev@baidu.com>
RUN ${UPDATE_MIRROR_CMD}
apt-get install locales
RUN localedef -f UTF-8 -i en_US en_US.UTF-8
RUN apt-get -y install gcc && \
apt-get -y clean
RUN pip install -U matplotlib jupyter numpy requests scipy
COPY . /book
RUN rm -rf /book/build
EXPOSE 8888
CMD ["sh", "-c", "jupyter notebook --ip=0.0.0.0 --no-browser --NotebookApp.token='' --NotebookApp.disable_check_xsrf=True /book/"]
EOF
#build docker image
echo "paddle_version:"$paddle_version
docker build --no-cache -t paddlepaddle/book:${paddle_version} -t paddlepaddle/book:latest -f ./build/Dockerfile .
......@@ -5,9 +5,9 @@ if [ $? -ne 0 ]; then
exit 1
fi
GOPATH=/tmp/go go get -u github.com/wangkuiyi/ipynb/markdown-to-ipynb
GOPATH=~/.go go get -u github.com/wangkuiyi/ipynb/markdown-to-ipynb
cur_path=$(dirname $(readlink -f $0))
cur_path="$(cd "$(dirname "$0")" && pwd -P)"
cd $cur_path/../
#convert md to ipynb
......
......@@ -334,7 +334,7 @@ def event_handler(event):
sys.stdout.write('.')
sys.stdout.flush()
if isinstance(event, paddle.event.EndPass):
result = trainer.test(reader=test_reader, reader_dict=reader_dict)
result = trainer.test(reader=test_reader, feeding=feeding)
print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
```
......
......@@ -376,7 +376,7 @@ def event_handler(event):
sys.stdout.write('.')
sys.stdout.flush()
if isinstance(event, paddle.event.EndPass):
result = trainer.test(reader=test_reader, reader_dict=reader_dict)
result = trainer.test(reader=test_reader, feeding=feeding)
print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)
```
......
# Semantic Role Labeling
Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
The source code of this chapter is live on [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
## Background
Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.
Natural language analysis techniques consist of lexical, syntactic, and semantic analysis. **Semantic Role Labeling (SRL)** is an instance of **Shallow Semantic Analysis**.
In the following example, “遇到” (encounters) is a Predicate (“Pred”),“小明” (Ming) is an Agent,“小红” (Hong) is a Patient,“昨天” (yesterday) indicates the Time, and “公园” (park) is the Location.
In a sentence, a **predicate** states a property or a characterization of a *subject*, such as what it does and what it is like. The predicate represents the core of an event, whereas the words accompanying the predicate are **arguments**. A **semantic role** refers to the abstract role an argument of a predicate take on in the event, including *agent*, *patient*, *theme*, *experiencer*, *beneficiary*, *instrument*, *location*, *goal*, and *source*.
$$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mbox{Time}\mbox{在[公园]}_{\mbox{Location}}\mbox{[遇到]}_{\mbox{Predicate}}\mbox{了[小红]}_{\mbox{Patient}}\mbox{。}$$
In the following example of a Chinese sentence, "to encounter" is the predicate (*pred*); "Ming" is the *agent*; "Hong" is the *patient*; "yesterday" and "evening" are the *time*; finally, "the park" is the *location*.
Instead of in-depth analysis on semantic information, the goal of Semantic Role Labeling is to identify the relation of predicate and other constituents, e.g., predicate-argument structure, as specific semantic roles, which is an important intermediate step in a wide range of natural language understanding tasks (Information Extraction, Discourse Analysis, DeepQA etc). Predicates are always assumed to be given; the only thing is to identify arguments and their semantic roles.
$$\mbox{[小明 Ming]}_{\mbox{Agent}}\mbox{[昨天 yesterday]}_{\mbox{Time}}\mbox{[晚上 evening]}_\mbox{Time}\mbox{在[公园 a park]}_{\mbox{Location}}\mbox{[遇到 to encounter]}_{\mbox{Predicate}}\mbox{了[小红 Hong]}_{\mbox{Patient}}\mbox{。}$$
Standard SRL system mostly builds on top of Syntactic Analysis and contains five steps:
Instead of analyzing the semantic information, **Semantic Role Labeling** (**SRL**) identifies the relation between the predicate and the other constituents surrounding it. The predicate-argument structures are labeled as specific semantic roles. A wide range of natural language understanding tasks, including *information extraction*, *discourse analysis*, and *deepQA*. Research usually assumes a predicate of a sentence to be specified; the only task is to identify its arguments and their semantic roles.
1. Construct a syntactic parse tree, as shown in Fig. 1
2. Identity candidate arguments of given predicate from constructed syntactic parse tree.
3. Prune most unlikely candidate arguments.
4. Identify arguments, often by a binary classifier.
5. Multi-class semantic role labeling. Steps 2-3 usually introduce hand-designed features based on Syntactic Analysis (step 1).
Conventional SRL systems mostly build on top of syntactic analysis, usually consisting of five steps:
1. Construct a syntax tree, as shown in Fig. 1
2. Identity the candidate arguments of the given predicate on the tree.
3. Prune the most unlikely candidate arguments.
4. Identify the real arguments, often by a binary classifier.
5. Multi-classify on results from step 4 to label the semantic roles. Steps 2 and 3 usually introduce hand-designed features based on syntactic analysis (step 1).
<div align="center">
<img src="image/dependency_parsing_en.png" width = "80%" align=center /><br>
Fig 1. Syntactic parse tree
Fig 1. Syntax tree
</div>
However, complete syntactic analysis requires identifying the relation among all constitutes and the performance of SRL is sensitive to the precision of syntactic analysis, which makes SRL a very challenging task. To reduce the complexity and obtain some syntactic structure information, we often use shallow syntactic analysis. Shallow Syntactic Analysis is also called partial parsing or chunking. Unlike complete syntactic analysis which requires the construction of the complete parsing tree, Shallow Syntactic Analysis only need to identify some independent components with relatively simple structure, such as verb phrases (chunk). To avoid difficulties in constructing a syntactic tree with high accuracy, some work\[[1](#Reference)\] proposed semantic chunking based SRL methods, which convert SRL as a sequence tagging problem. Sequence tagging tasks classify syntactic chunks using BIO representation. For syntactic chunks forming a chunk of type A, the first chunk receives the B-A tag (Begin), the remaining ones receive the tag I-A (Inside), and all chunks outside receive the tag O-A.
However, a complete syntactic analysis requires identifying the relation among all constituents. Thus, the accuracy of SRL is sensitive to the preciseness of the syntactic analysis, making SRL challenging. To reduce its complexity and obtain some information on the syntactic structures, we often use *shallow syntactic analysis* a.k.a. partial parsing or chunking. Unlike complete syntactic analysis, which requires the construction of the complete parsing tree, *Shallow Syntactic Analysis* only requires identifying some independent constituents with relatively simple structures, such as verb phrases (chunk). To avoid difficulties in constructing a syntax tree with high accuracy, some work\[[1](#Reference)\] proposed semantic chunking-based SRL methods, which reduces SRL into a sequence tagging problem. Sequence tagging tasks classify syntactic chunks using **BIO representation**. For syntactic chunks forming role A, its first chunk receives the B-A tag (Begin) and the remaining ones receive the tag I-A (Inside); in the end, the chunks left out receive the tag O.
The BIO representation of above example is shown in Fig.1.
<div align="center">
<img src="image/bio_example_en.png" width = "90%" align=center /><br>
Fig 2. BIO represention
Fig 2. BIO representation
</div>
This example illustrates the simplicity of sequence tagging because (1) shallow syntactic analysis reduces the precision requirement of syntactic analysis; (2) pruning candidate arguments is removed; 3) argument identification and tagging are finished at the same time. Such unified methods simplify the procedure, reduce the risk of accumulating errors and boost the performance further.
This example illustrates the simplicity of sequence tagging, since
1. It only relies on shallow syntactic analysis, reduces the precision requirement of syntactic analysis;
2. Pruning the candidate arguments is no longer necessary;
3. Arguments are identified and tagged at the same time. Simplifying the workflow reduces the risk of accumulating errors; oftentimes, methods that unify multiple steps boost performance.
In this tutorial, our SRL system is built as an end-to-end system via a neural network. We take only text sequences, without using any syntactic parsing results or complex hand-designed features. We give public dataset [CoNLL-2004 and CoNLL-2005 Shared Tasks](http://www.cs.upc.edu/~srlconll/) as an example to illustrate: given a sentence with predicates marked, identify the corresponding arguments and their semantic roles by sequence tagging method.
......@@ -48,15 +54,19 @@ Recurrent Neural Networks are important tools for sequence modeling and have bee
### Stacked Recurrent Neural Network
Deep Neural Networks allows extracting hierarchical representations. Higher layers can form more abstract/complex representations on top of lower layers. LSTMs, when unfolded in time, is a deep feed-forward neural network, because a computational path between the input at time $k < t$ to the output at time $t$ crosses several nonlinear layers. However, the computation carried out at each time-step is only linear transformation, which makes LSTMs a shallow model. Deep LSTMs are typically constructed by stacking multiple LSTM layers on top of each other and taking the output from lower LSTM layer at time $t$ as the input of upper LSTM layer at time $t$. Deep, hierarchical neural networks can be much efficient at representing some functions and modeling varying-length dependencies\[[2](#Reference)\].
*Deep Neural Networks* can extract hierarchical representations. The higher layers can form relatively abstract/complex representations, based on primitive features discovered through the lower layers. Unfolding LSTMs through time results in a deep feed-forward neural network. This is because any computational path between the input at time $k < t$ to the output at time $t$ crosses several nonlinear layers. On the other hand, due to parameter sharing over time, LSTMs are also *shallow*; the computation carried out at each time-step is just a linear transformation. Deep LSTM networks are typically constructed by stacking multiple LSTM layers on top of each other and taking the output from lower LSTM layer at time $t$ as the input of upper LSTM layer at time $t$. Deep, hierarchical neural networks can be efficient at representing some functions and modeling varying-length dependencies\[[2](#Reference)\].
However, a deep LSTM network increases the number of nonlinear steps the gradient has to traverse when propagated back in depth. As a result, while LSTMs of 4 layers can be trained properly, those with 4-8 have much worse performance. Conventional LSTMs prevent backpropagated errors from vanishing and exploding by introducing shortcut connections to skip the intermediate nonlinear layers. Therefore, deep LSTMs can consider shortcut connections in depth as well.
However, deep LSTMs increases the number of nonlinear steps the gradient has to traverse when propagated back in depth. For example, four layer LSTMs can be trained properly, but the performance becomes worse as the number of layers up to 4-8. Conventional LSTMs prevent backpropagated errors from vanishing and exploding by introducing shortcut connections to skip the intermediate nonlinear layers. Therefore, deep LSTMs can consider shortcut connections in depth as well.
A single LSTM cell has three operations:
The operation of a single LSTM cell contain 3 parts: (1) input-to-hidden: map input $x$ to the input of the forget gates, input gates, memory cells and output gates by linear transformation (i.e., matrix mapping); (2) hidden-to-hidden: calculate forget gates, input gates, output gates and update memory cell, this is the main part of LSTMs; (3)hidden-to-output: this part typically involves an activation operation on hidden states. Based on the stacked LSTMs, we add a shortcut connection: take the input-to-hidden from the previous layer as a new input and learn another linear transformation.
1. input-to-hidden: map input $x$ to the input of the forget gates, input gates, memory cells and output gates by linear transformation (i.e., matrix mapping);
2. hidden-to-hidden: calculate forget gates, input gates, output gates and update memory cell, this is the main part of LSTMs;
3. hidden-to-output: this part typically involves an activation operation on hidden states. Based on the stacked LSTMs, we add a shortcut connection: take the input-to-hidden from the previous layer as a new input and learn another linear transformation.
Fig.3 illustrate the final stacked recurrent neural networks.
Fig.3 illustrates the final stacked recurrent neural networks.
<p align="center">
<img src="./image/stacked_lstm_en.png" width = "40%" align=center><br>
......
......@@ -42,45 +42,51 @@
<div id="markdown" style='display:none'>
# Semantic Role Labeling
Source code of this chapter is in [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
The source code of this chapter is live on [book/label_semantic_roles](https://github.com/PaddlePaddle/book/tree/develop/label_semantic_roles).
For instructions on getting started with PaddlePaddle, see [PaddlePaddle installation guide](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/getstarted/build_and_install/docker_install_en.rst).
## Background
Natural Language Analysis contains three components: Lexical Analysis, Syntactic Analysis, and Semantic Analysis. Semantic Role Labelling (SRL) is one way for Shallow Semantic Analysis. A predicate of a sentence is a property that a subject possesses or is characterized, such as what it does, what it is or how it is, which mostly corresponds to the core of an event. The noun associated with a predicate is called Argument. Semantic roles express the abstract roles that arguments of a predicate can take in the event, such as Agent, Patient, Theme, Experiencer, Beneficiary, Instrument, Location, Goal and Source, etc.
Natural language analysis techniques consist of lexical, syntactic, and semantic analysis. **Semantic Role Labeling (SRL)** is an instance of **Shallow Semantic Analysis**.
In the following example, “遇到” (encounters) is a Predicate (“Pred”),“小明” (Ming) is an Agent,“小红” (Hong) is a Patient,“昨天” (yesterday) indicates the Time, and “公园” (park) is the Location.
In a sentence, a **predicate** states a property or a characterization of a *subject*, such as what it does and what it is like. The predicate represents the core of an event, whereas the words accompanying the predicate are **arguments**. A **semantic role** refers to the abstract role an argument of a predicate take on in the event, including *agent*, *patient*, *theme*, *experiencer*, *beneficiary*, *instrument*, *location*, *goal*, and *source*.
$$\mbox{[小明]}_{\mbox{Agent}}\mbox{[昨天]}_{\mbox{Time}}\mbox{[晚上]}_\mbox{Time}\mbox{在[公园]}_{\mbox{Location}}\mbox{[遇到]}_{\mbox{Predicate}}\mbox{了[小红]}_{\mbox{Patient}}\mbox{。}$$
In the following example of a Chinese sentence, "to encounter" is the predicate (*pred*); "Ming" is the *agent*; "Hong" is the *patient*; "yesterday" and "evening" are the *time*; finally, "the park" is the *location*.
Instead of in-depth analysis on semantic information, the goal of Semantic Role Labeling is to identify the relation of predicate and other constituents, e.g., predicate-argument structure, as specific semantic roles, which is an important intermediate step in a wide range of natural language understanding tasks (Information Extraction, Discourse Analysis, DeepQA etc). Predicates are always assumed to be given; the only thing is to identify arguments and their semantic roles.
$$\mbox{[小明 Ming]}_{\mbox{Agent}}\mbox{[昨天 yesterday]}_{\mbox{Time}}\mbox{[晚上 evening]}_\mbox{Time}\mbox{在[公园 a park]}_{\mbox{Location}}\mbox{[遇到 to encounter]}_{\mbox{Predicate}}\mbox{了[小红 Hong]}_{\mbox{Patient}}\mbox{。}$$
Standard SRL system mostly builds on top of Syntactic Analysis and contains five steps:
Instead of analyzing the semantic information, **Semantic Role Labeling** (**SRL**) identifies the relation between the predicate and the other constituents surrounding it. The predicate-argument structures are labeled as specific semantic roles. A wide range of natural language understanding tasks, including *information extraction*, *discourse analysis*, and *deepQA*. Research usually assumes a predicate of a sentence to be specified; the only task is to identify its arguments and their semantic roles.
1. Construct a syntactic parse tree, as shown in Fig. 1
2. Identity candidate arguments of given predicate from constructed syntactic parse tree.
3. Prune most unlikely candidate arguments.
4. Identify arguments, often by a binary classifier.
5. Multi-class semantic role labeling. Steps 2-3 usually introduce hand-designed features based on Syntactic Analysis (step 1).
Conventional SRL systems mostly build on top of syntactic analysis, usually consisting of five steps:
1. Construct a syntax tree, as shown in Fig. 1
2. Identity the candidate arguments of the given predicate on the tree.
3. Prune the most unlikely candidate arguments.
4. Identify the real arguments, often by a binary classifier.
5. Multi-classify on results from step 4 to label the semantic roles. Steps 2 and 3 usually introduce hand-designed features based on syntactic analysis (step 1).
<div align="center">
<img src="image/dependency_parsing_en.png" width = "80%" align=center /><br>
Fig 1. Syntactic parse tree
Fig 1. Syntax tree
</div>
However, complete syntactic analysis requires identifying the relation among all constitutes and the performance of SRL is sensitive to the precision of syntactic analysis, which makes SRL a very challenging task. To reduce the complexity and obtain some syntactic structure information, we often use shallow syntactic analysis. Shallow Syntactic Analysis is also called partial parsing or chunking. Unlike complete syntactic analysis which requires the construction of the complete parsing tree, Shallow Syntactic Analysis only need to identify some independent components with relatively simple structure, such as verb phrases (chunk). To avoid difficulties in constructing a syntactic tree with high accuracy, some work\[[1](#Reference)\] proposed semantic chunking based SRL methods, which convert SRL as a sequence tagging problem. Sequence tagging tasks classify syntactic chunks using BIO representation. For syntactic chunks forming a chunk of type A, the first chunk receives the B-A tag (Begin), the remaining ones receive the tag I-A (Inside), and all chunks outside receive the tag O-A.
However, a complete syntactic analysis requires identifying the relation among all constituents. Thus, the accuracy of SRL is sensitive to the preciseness of the syntactic analysis, making SRL challenging. To reduce its complexity and obtain some information on the syntactic structures, we often use *shallow syntactic analysis* a.k.a. partial parsing or chunking. Unlike complete syntactic analysis, which requires the construction of the complete parsing tree, *Shallow Syntactic Analysis* only requires identifying some independent constituents with relatively simple structures, such as verb phrases (chunk). To avoid difficulties in constructing a syntax tree with high accuracy, some work\[[1](#Reference)\] proposed semantic chunking-based SRL methods, which reduces SRL into a sequence tagging problem. Sequence tagging tasks classify syntactic chunks using **BIO representation**. For syntactic chunks forming role A, its first chunk receives the B-A tag (Begin) and the remaining ones receive the tag I-A (Inside); in the end, the chunks left out receive the tag O.
The BIO representation of above example is shown in Fig.1.
<div align="center">
<img src="image/bio_example_en.png" width = "90%" align=center /><br>
Fig 2. BIO represention
Fig 2. BIO representation
</div>
This example illustrates the simplicity of sequence tagging because (1) shallow syntactic analysis reduces the precision requirement of syntactic analysis; (2) pruning candidate arguments is removed; 3) argument identification and tagging are finished at the same time. Such unified methods simplify the procedure, reduce the risk of accumulating errors and boost the performance further.
This example illustrates the simplicity of sequence tagging, since
1. It only relies on shallow syntactic analysis, reduces the precision requirement of syntactic analysis;
2. Pruning the candidate arguments is no longer necessary;
3. Arguments are identified and tagged at the same time. Simplifying the workflow reduces the risk of accumulating errors; oftentimes, methods that unify multiple steps boost performance.
In this tutorial, our SRL system is built as an end-to-end system via a neural network. We take only text sequences, without using any syntactic parsing results or complex hand-designed features. We give public dataset [CoNLL-2004 and CoNLL-2005 Shared Tasks](http://www.cs.upc.edu/~srlconll/) as an example to illustrate: given a sentence with predicates marked, identify the corresponding arguments and their semantic roles by sequence tagging method.
......@@ -90,15 +96,19 @@ Recurrent Neural Networks are important tools for sequence modeling and have bee
### Stacked Recurrent Neural Network
Deep Neural Networks allows extracting hierarchical representations. Higher layers can form more abstract/complex representations on top of lower layers. LSTMs, when unfolded in time, is a deep feed-forward neural network, because a computational path between the input at time $k < t$ to the output at time $t$ crosses several nonlinear layers. However, the computation carried out at each time-step is only linear transformation, which makes LSTMs a shallow model. Deep LSTMs are typically constructed by stacking multiple LSTM layers on top of each other and taking the output from lower LSTM layer at time $t$ as the input of upper LSTM layer at time $t$. Deep, hierarchical neural networks can be much efficient at representing some functions and modeling varying-length dependencies\[[2](#Reference)\].
*Deep Neural Networks* can extract hierarchical representations. The higher layers can form relatively abstract/complex representations, based on primitive features discovered through the lower layers. Unfolding LSTMs through time results in a deep feed-forward neural network. This is because any computational path between the input at time $k < t$ to the output at time $t$ crosses several nonlinear layers. On the other hand, due to parameter sharing over time, LSTMs are also *shallow*; the computation carried out at each time-step is just a linear transformation. Deep LSTM networks are typically constructed by stacking multiple LSTM layers on top of each other and taking the output from lower LSTM layer at time $t$ as the input of upper LSTM layer at time $t$. Deep, hierarchical neural networks can be efficient at representing some functions and modeling varying-length dependencies\[[2](#Reference)\].
However, a deep LSTM network increases the number of nonlinear steps the gradient has to traverse when propagated back in depth. As a result, while LSTMs of 4 layers can be trained properly, those with 4-8 have much worse performance. Conventional LSTMs prevent backpropagated errors from vanishing and exploding by introducing shortcut connections to skip the intermediate nonlinear layers. Therefore, deep LSTMs can consider shortcut connections in depth as well.
However, deep LSTMs increases the number of nonlinear steps the gradient has to traverse when propagated back in depth. For example, four layer LSTMs can be trained properly, but the performance becomes worse as the number of layers up to 4-8. Conventional LSTMs prevent backpropagated errors from vanishing and exploding by introducing shortcut connections to skip the intermediate nonlinear layers. Therefore, deep LSTMs can consider shortcut connections in depth as well.
A single LSTM cell has three operations:
The operation of a single LSTM cell contain 3 parts: (1) input-to-hidden: map input $x$ to the input of the forget gates, input gates, memory cells and output gates by linear transformation (i.e., matrix mapping); (2) hidden-to-hidden: calculate forget gates, input gates, output gates and update memory cell, this is the main part of LSTMs; (3)hidden-to-output: this part typically involves an activation operation on hidden states. Based on the stacked LSTMs, we add a shortcut connection: take the input-to-hidden from the previous layer as a new input and learn another linear transformation.
1. input-to-hidden: map input $x$ to the input of the forget gates, input gates, memory cells and output gates by linear transformation (i.e., matrix mapping);
2. hidden-to-hidden: calculate forget gates, input gates, output gates and update memory cell, this is the main part of LSTMs;
3. hidden-to-output: this part typically involves an activation operation on hidden states. Based on the stacked LSTMs, we add a shortcut connection: take the input-to-hidden from the previous layer as a new input and learn another linear transformation.
Fig.3 illustrate the final stacked recurrent neural networks.
Fig.3 illustrates the final stacked recurrent neural networks.
<p align="center">
<img src="./image/stacked_lstm_en.png" width = "40%" align=center><br>
......
......@@ -399,7 +399,7 @@ for param in parameters.keys():
```python
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=1e-3))
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer)
......@@ -423,7 +423,7 @@ for param in parameters.keys():
trainer.train(
reader=wmt14_reader,
event_handler=event_handler,
num_passes=10000,
num_passes=2,
feeding=feeding)
```
......
......@@ -361,7 +361,7 @@ for param in parameters.keys():
```python
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=1e-3))
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer)
......@@ -388,7 +388,7 @@ for param in parameters.keys():
trainer.train(
reader=wmt14_reader,
event_handler=event_handler,
num_passes=10000,
num_passes=2,
feeding=feeding)
```
......
......@@ -107,7 +107,7 @@ def main():
# define optimize method and trainer
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=1e-3))
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=optimizer)
......@@ -137,7 +137,7 @@ def main():
trainer.train(
reader=wmt14_reader,
event_handler=event_handler,
num_passes=10000,
num_passes=2,
feeding=feeding)
......
......@@ -441,7 +441,7 @@ for param in parameters.keys():
```python
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=1e-3))
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer)
......@@ -465,7 +465,7 @@ for param in parameters.keys():
trainer.train(
reader=wmt14_reader,
event_handler=event_handler,
num_passes=10000,
num_passes=2,
feeding=feeding)
```
......
......@@ -403,7 +403,7 @@ for param in parameters.keys():
```python
optimizer = paddle.optimizer.Adam(
learning_rate=5e-5,
regularization=paddle.optimizer.L2Regularization(rate=1e-3))
regularization=paddle.optimizer.L2Regularization(rate=8e-4))
trainer = paddle.trainer.SGD(cost=cost,
parameters=parameters,
update_equation=optimizer)
......@@ -430,7 +430,7 @@ for param in parameters.keys():
trainer.train(
reader=wmt14_reader,
event_handler=event_handler,
num_passes=10000,
num_passes=2,
feeding=feeding)
```
......
......@@ -87,6 +87,7 @@ The raw `MoiveLens` contains movie ratings, relevant features from both movies a
For instance, one movie's feature could be:
```python
import paddle.v2 as paddle
movie_info = paddle.dataset.movielens.movie_info()
print movie_info.values()[0]
```
......
......@@ -129,6 +129,7 @@ The raw `MoiveLens` contains movie ratings, relevant features from both movies a
For instance, one movie's feature could be:
```python
import paddle.v2 as paddle
movie_info = paddle.dataset.movielens.movie_info()
print movie_info.values()[0]
```
......
# Deep Learning with PaddlePaddle
1. [Fit a Line](http://book.paddlepaddle.org/fit_a_line/index.en.html)
1. [Recognize Digits](http://book.paddlepaddle.org/recognize_digits/index.en.html)
1. [Image Classification](http://book.paddlepaddle.org/image_classification/index.en.html)
1. [Word to Vector](http://book.paddlepaddle.org/word2vec/index.en.html)
1. [Understand Sentiment](http://book.paddlepaddle.org/understand_sentiment/index.en.html)
1. [Label Semantic Roles](http://book.paddlepaddle.org/label_semantic_roles/index.en.html)
1. [Machine Translation](http://book.paddlepaddle.org/machine_translation/index.en.html)
1. [Recommender System](http://book.paddlepaddle.org/recommender_system/index.en.html)
1. [Fit a Line](http://book.paddlepaddle.org/01.fit_a_line/index.en.html)
1. [Recognize Digits](http://book.paddlepaddle.org/02.recognize_digits/index.en.html)
1. [Image Classification](http://book.paddlepaddle.org/03.image_classification/index.en.html)
1. [Word to Vector](http://book.paddlepaddle.org/04.word2vec/index.en.html)
1. [Understand Sentiment](http://book.paddlepaddle.org/05.understand_sentiment/index.en.html)
1. [Label Semantic Roles](http://book.paddlepaddle.org/06.label_semantic_roles/index.en.html)
1. [Machine Translation](http://book.paddlepaddle.org/07.machine_translation/index.en.html)
1. [Recommender System](http://book.paddlepaddle.org/08.recommender_system/index.en.html)
## Running the Book
This book you are reading is interactive -- each chapter can run as a Jupyter Notebook.
We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker image. So you don't need to install anything except Docker. If you are using Windows, please follow [this installation guide](https://www.docker.com/docker-windows). If you are running Mac, please follow [this](https://www.docker.com/docker-mac). For various Linux distros, please refer to https://www.docker.com. If you are using Windows or Mac, you might want to give Docker [more memory and CPUs/cores](http://stackoverflow.com/a/39720010/724872).
Just type
```bash
docker run -d -p 8888:8888 paddlepaddle/book
```
This command will download the pre-built Docker image from DockerHub.com and run it in a container. Please direct your Web browser to http://localhost:8888 to read the book.
If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:
```bash
docker run -d -p 8888:8888 docker.paddlepaddle.org/book
```
## Contribute
Your contribution is welcome! Please feel free to file Pull Requests to add your chapter as a directory under `/pending`. Once it is going stable, the community would like to move it to `/`.
To write, run, and debug your chapters, you will need Python 2.x, Go >1.5. You can build the Docker image using [this script](https://github.com/PaddlePaddle/book/blob/develop/.tools/convert-markdown-into-ipynb-and-test.sh).
This tutorial is contributed by <a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a>, and licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>.
# 深度学习入门
1. [新手入门](http://book.paddlepaddle.org/fit_a_line)
1. [识别数字](http://book.paddlepaddle.org/recognize_digits)
1. [图像分类](http://book.paddlepaddle.org/image_classification)
1. [词向量](http://book.paddlepaddle.org/word2vec)
1. [情感分析](http://book.paddlepaddle.org/understand_sentiment)
1. [语义角色标注](http://book.paddlepaddle.org/label_semantic_roles)
1. [机器翻译](http://book.paddlepaddle.org/machine_translation)
1. [个性化推荐](http://book.paddlepaddle.org/recommender_system)
1. [新手入门](http://book.paddlepaddle.org/01.fit_a_line)
1. [识别数字](http://book.paddlepaddle.org/02.recognize_digits)
1. [图像分类](http://book.paddlepaddle.org/03.image_classification)
1. [词向量](http://book.paddlepaddle.org/04.word2vec)
1. [情感分析](http://book.paddlepaddle.org/05.understand_sentiment)
1. [语义角色标注](http://book.paddlepaddle.org/06.label_semantic_roles)
1. [机器翻译](http://book.paddlepaddle.org/07.machine_translation)
1. [个性化推荐](http://book.paddlepaddle.org/08.recommender_system)
## 运行这本书
您现在在看的这本书是一本“交互式”电子书 —— 每一章都可以运行在一个
Jupyter Notebook 里。
我们把 Jupyter、PaddlePaddle、以及各种被依赖的软件都打包进一个 Docker
image 了。所以您不需要自己来安装各种软件,只需要安装 Docker 即可。如果
您使用 Windows,可以参
[这里](https://www.docker.com/docker-windows)。如果您使用 Mac,可以
参考[这里](https://www.docker.com/docker-mac)。 对于各种 Linux 发行版,
请参考https://www.docker.com 。如果您使用 Windows 或者 Mac,可以通过如
下方法给 Docker 更多内存和CPU资源
(http://stackoverflow.com/a/39720010/724872)。
只需要在命令行窗口里运行:
```bash
docker run -d -p 8888:8888 paddlepaddle/book
```
这个命令会从 DockerHub.com 下载本书的 Docker image 并且运行之。请在浏
览器里访问 http://localhost:8888 即可阅读和在线编辑本书。
如果您访问 DockerHub.com 很慢,可以试试我们的另一个镜像
docker.paddlepaddle.org:
```bash
docker run -d -p 8888:8888 docker.paddlepaddle.org/book
```
## 贡献内容
您要是能贡献新的章节那就太好了!请发 Pull Requests 把您写的章节加入到
`/pending` 下面的一个子目录里。当这一章稳定下来,我们一起把您的目录挪
到根目录。
为了写作、运行、调试,您需要安装 Python 2.x, Go >1.5. 你可以用这
[脚本程序](https://github.com/PaddlePaddle/book/blob/develop/.tools/convert-markdown-into-ipynb-and-test.sh)
生成 Docker image。
<a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/"><img alt="知识共享许可协议" style="border-width:0" src="https://i.creativecommons.org/l/by-nc-sa/4.0/88x31.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" href="http://purl.org/dc/dcmitype/Text" property="dct:title" rel="dct:type">本教程</span><a xmlns:cc="http://creativecommons.org/ns#" href="http://book.paddlepaddle.org" property="cc:attributionName" rel="cc:attributionURL">PaddlePaddle</a> 创作,采用 <a rel="license" href="http://creativecommons.org/licenses/by-nc-sa/4.0/">知识共享 署名-非商业性使用-相同方式共享 4.0 国际 许可协议</a>进行许可。
......@@ -103,35 +103,35 @@
<div class="card-block pl-0 pr-0 pt-0 pb-0">
<div class="list-group ">
<a href="./fit_a_line/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./01.fit_a_line/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Linear Regression
</a>
<a href="./recognize_digits/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./02.recognize_digits/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Recognize Digits
</a>
<a href="./image_classification/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./03.image_classification/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Image Classification
</a>
<a href="./word2vec/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./04.word2vec/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Word2Vec
</a>
<a href="./understand_sentiment/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./05.understand_sentiment/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Sentiment Analysis
</a>
<a href="./label_semantic_roles/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./06.label_semantic_roles/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Semantic Role Labeling
</a>
<a href="./machine_translation/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./07.machine_translation/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Machine Translation
</a>
<a href="./recommender_system/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./08.recommender_system/index.en.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
Personalized Recommendation
</a>
......@@ -142,7 +142,7 @@
</div>
</div>
<div class="col">
<iframe src="./fit_a_line/index.en.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
<iframe src="./01.fit_a_line/index.en.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
</iframe>
</div>
</div>
......
......@@ -3,35 +3,35 @@
"chapters": [
{
"name": "Linear Regression",
"link": "./fit_a_line/index.en.html"
"link": "./01.fit_a_line/index.en.html"
},
{
"name": "Recognize Digits",
"link": "./recognize_digits/index.en.html"
"link": "./02.recognize_digits/index.en.html"
},
{
"name": "Image Classification",
"link": "./image_classification/index.en.html"
"link": "./03.image_classification/index.en.html"
},
{
"name": "Word2Vec",
"link": "./word2vec/index.en.html"
"link": "./04.word2vec/index.en.html"
},
{
"name": "Sentiment Analysis",
"link": "./understand_sentiment/index.en.html"
"link": "./05.understand_sentiment/index.en.html"
},
{
"name": "Semantic Role Labeling",
"link": "./label_semantic_roles/index.en.html"
"link": "./06.label_semantic_roles/index.en.html"
},
{
"name": "Machine Translation",
"link": "./machine_translation/index.en.html"
"link": "./07.machine_translation/index.en.html"
},
{
"name": "Personalized Recommendation",
"link": "./recommender_system/index.en.html"
"link": "./08.recommender_system/index.en.html"
}
]
}
......@@ -107,35 +107,35 @@
<div class="card-block pl-0 pr-0 pt-0 pb-0">
<div class="list-group ">
<a href="./fit_a_line/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./01.fit_a_line/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
新手入门
</a>
<a href="./recognize_digits/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./02.recognize_digits/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
识别数字
</a>
<a href="./image_classification/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./03.image_classification/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
图像分类
</a>
<a href="./word2vec/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./04.word2vec/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
词向量
</a>
<a href="./understand_sentiment/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./05.understand_sentiment/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
情感分析
</a>
<a href="./label_semantic_roles/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./06.label_semantic_roles/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
语义角色标注
</a>
<a href="./machine_translation/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./07.machine_translation/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
机器翻译
</a>
<a href="./recommender_system/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
<a href="./08.recommender_system/index.html" target="content_iframe" class="list-group-item list-group-item-action" style="border: 0px; border-bottom: 2px solid #dddfe3;">
个性化推荐
</a>
......@@ -146,7 +146,7 @@
</div>
</div>
<div class="col">
<iframe src="./fit_a_line/index.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
<iframe src="./01.fit_a_line/index.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
</iframe>
</div>
</div>
......
......@@ -3,35 +3,35 @@
"chapters": [
{
"name": "新手入门",
"link": "./fit_a_line/index.html"
"link": "./01.fit_a_line/index.html"
},
{
"name": "识别数字",
"link": "./recognize_digits/index.html"
"link": "./02.recognize_digits/index.html"
},
{
"name": "图像分类",
"link": "./image_classification/index.html"
"link": "./03.image_classification/index.html"
},
{
"name": "词向量",
"link": "./word2vec/index.html"
"link": "./04.word2vec/index.html"
},
{
"name": "情感分析",
"link": "./understand_sentiment/index.html"
"link": "./05.understand_sentiment/index.html"
},
{
"name": "语义角色标注",
"link": "./label_semantic_roles/index.html"
"link": "./06.label_semantic_roles/index.html"
},
{
"name": "机器翻译",
"link": "./machine_translation/index.html"
"link": "./07.machine_translation/index.html"
},
{
"name": "个性化推荐",
"link": "./recommender_system/index.html"
"link": "./08.recommender_system/index.html"
}
]
}
......@@ -118,7 +118,7 @@
</div>
</div>
<div class="col">
<iframe src="./fit_a_line/index{% if is_en %}.en{% endif %}.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
<iframe src="./01.fit_a_line/index{% if is_en %}.en{% endif %}.html" style="border: none; overflow-y : hidden" width="100%" height="100%" name="content_iframe" id="content_iframe">
</iframe>
</div>
</div>
......
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册