This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
@@ -313,8 +313,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
...
@@ -313,8 +313,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
@@ -355,8 +355,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
...
@@ -355,8 +355,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
@@ -87,7 +87,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
...
@@ -87,7 +87,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
Fig 4. Bidirectional LSTMs
Fig 4. Bidirectional LSTMs
</p>
</p>
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
### Conditional Random Field (CRF)
### Conditional Random Field (CRF)
...
@@ -118,7 +118,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
...
@@ -118,7 +118,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
### Deep Bidirectional LSTM (DB-LSTM) SRL model
### Deep Bidirectional LSTM (DB-LSTM) SRL model
...
@@ -211,7 +211,6 @@ Here we fetch the dictionary, and print its size:
...
@@ -211,7 +211,6 @@ Here we fetch the dictionary, and print its size:
@@ -129,7 +129,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
...
@@ -129,7 +129,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
Fig 4. Bidirectional LSTMs
Fig 4. Bidirectional LSTMs
</p>
</p>
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
### Conditional Random Field (CRF)
### Conditional Random Field (CRF)
...
@@ -160,7 +160,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
...
@@ -160,7 +160,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
### Deep Bidirectional LSTM (DB-LSTM) SRL model
### Deep Bidirectional LSTM (DB-LSTM) SRL model
...
@@ -253,7 +253,6 @@ Here we fetch the dictionary, and print its size:
...
@@ -253,7 +253,6 @@ Here we fetch the dictionary, and print its size:
```python
```python
import math
import math
import numpy as np
import numpy as np
import gzip
import paddle.v2 as paddle
import paddle.v2 as paddle
import paddle.v2.dataset.conll05 as conll05
import paddle.v2.dataset.conll05 as conll05
import paddle.v2.evaluator as evaluator
import paddle.v2.evaluator as evaluator
...
@@ -508,7 +507,7 @@ def event_handler(event):
...
@@ -508,7 +507,7 @@ def event_handler(event):
if isinstance(event, paddle.event.EndPass):
if isinstance(event, paddle.event.EndPass):
# save parameters
# save parameters
with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
parameters.to_tar(f)
parameters.to_tar(f)
result = trainer.test(reader=reader, feeding=feeding)
result = trainer.test(reader=reader, feeding=feeding)
decoder_size=512# hidden layer size of GRU in decoder
decoder_size=512# hidden layer size of GRU in decoder
beam_size=3# expand width in beam search
beam_size=3# expand width in beam search
max_length=250# a stop condition of sequence generation
max_length=250# a stop condition of sequence generation
```
```
2. Implement Encoder as follows:
2. Implement Encoder as follows:
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
...
@@ -289,8 +290,7 @@ is_generating = False
...
@@ -289,8 +290,7 @@ is_generating = False
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
out += paddle.layer.full_matrix_projection(input=gru_step)
input=gru_step)
returnout
returnout
```
```
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
decoder_size = 512 # hidden layer size of GRU in decoder
decoder_size = 512 # hidden layer size of GRU in decoder
beam_size = 3 # expand width in beam search
beam_size = 3 # expand width in beam search
max_length = 250 # a stop condition of sequence generation
max_length = 250 # a stop condition of sequence generation
```
```
2. Implement Encoder as follows:
2. Implement Encoder as follows:
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
...
@@ -331,8 +332,7 @@ is_generating = False
...
@@ -331,8 +332,7 @@ is_generating = False
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
out += paddle.layer.full_matrix_projection(input=gru_step)
input=gru_step)
return out
return out
```
```
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
@@ -22,7 +22,8 @@ We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker i
...
@@ -22,7 +22,8 @@ We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker i
Just type
Just type
```bash
```bash
docker run -d-p 8888:8888 paddlepaddle/book:0.10.0
docker run -d-p 8888:8888 paddlepaddle/book
```
```
This command will download the pre-built Docker image from DockerHub.com and run it in a container. Please direct your Web browser to http://localhost:8888 to read the book.
This command will download the pre-built Docker image from DockerHub.com and run it in a container. Please direct your Web browser to http://localhost:8888 to read the book.
...
@@ -30,7 +31,8 @@ This command will download the pre-built Docker image from DockerHub.com and run
...
@@ -30,7 +31,8 @@ This command will download the pre-built Docker image from DockerHub.com and run
If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:
If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:
```bash
```bash
docker run -d-p 8888:8888 docker.paddlepaddle.org/book:0.10.0
docker run -d-p 8888:8888 docker.paddlepaddle.org/book
```
```
### Training with GPU
### Training with GPU
...
@@ -40,13 +42,15 @@ By default we are using CPU for training, if you want to train with GPU, the ste
...
@@ -40,13 +42,15 @@ By default we are using CPU for training, if you want to train with GPU, the ste
To make sure GPU can be successfully used from inside container, please install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Then run:
To make sure GPU can be successfully used from inside container, please install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Then run:
```bash
```bash
nvidia-docker run -d-p 8888:8888 paddlepaddle/book:0.10.0-gpu
nvidia-docker run -d-p 8888:8888 paddlepaddle/book:latest-gpu
```
```
Or you can use the image registry mirror in China:
Or you can use the image registry mirror in China:
```bash
```bash
nvidia-docker run -d-p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
nvidia-docker run -d-p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
```
```
Change the code in the chapter that you are reading from
Change the code in the chapter that you are reading from