This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
This chapter introduces *Linear Regression* and how to train and test this model with PaddlePaddle, using the UCI Housing Data Set. Because a large number of more complex models and techniques are derived from linear regression, it is important to understand its underlying theory and limitation.
@@ -313,8 +313,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
@@ -355,8 +355,6 @@ Next, we will begin the training process. `paddle.dataset.imikolov.train()` and
`paddle.batch` takes reader as input, outputs a **batched reader**: In PaddlePaddle, a reader outputs a single data instance at a time but batched reader outputs a minibatch of data instances.
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
对卷积神经网络来说,首先使用卷积处理输入的词向量序列,产生一个特征图(feature map),对特征图采用时间维度上的最大池化(max pooling over time)操作得到此卷积核对应的整句话的特征,最后,将所有卷积核得到的特征拼接起来即为文本的定长向量表示,对于文本分类问题,将其连接至softmax即构建出完整的模型。在实际应用中,我们会使用多个卷积核来处理句子,窗口大小相同的卷积核堆叠起来形成一个矩阵,这样可以更高效的完成运算。另外,我们也可使用窗口大小不同的卷积核来处理句子,[推荐系统](https://github.com/PaddlePaddle/book/tree/develop/05.recommender_system)一节的图3作为示意画了四个卷积核,不同颜色表示不同大小的卷积核操作。
@@ -87,7 +87,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
Fig 4. Bidirectional LSTMs
</p>
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
### Conditional Random Field (CRF)
...
...
@@ -118,7 +118,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
### Deep Bidirectional LSTM (DB-LSTM) SRL model
...
...
@@ -211,7 +211,6 @@ Here we fetch the dictionary, and print its size:
@@ -129,7 +129,7 @@ To address, we can design a bidirectional recurrent neural network by making a m
Fig 4. Bidirectional LSTMs
</p>
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/machine_translation/README.en.md)
Note that, this bidirectional RNNs is different with the one proposed by Bengio et al. in machine translation tasks \[[3](#Reference), [4](#Reference)\]. We will introduce another bidirectional RNNs in the following tasks [machine translation](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md)
### Conditional Random Field (CRF)
...
...
@@ -160,7 +160,7 @@ where $\omega$ are the weights to the feature function that the CRF learns. Whil
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/07.machine_translation/README.en.md#Beam%20Search%20Algorithm)).
This objective function can be solved via back-propagation in an end-to-end manner. While decoding, given input sequences $X$, search for sequence $\bar{Y}$ to maximize the conditional probability $\bar{P}(Y|X)$ via decoding methods (such as *Viterbi*, or [Beam Search Algorithm](https://github.com/PaddlePaddle/book/blob/develop/08.machine_translation/README.md#beam-search-algorithm)).
### Deep Bidirectional LSTM (DB-LSTM) SRL model
...
...
@@ -253,7 +253,6 @@ Here we fetch the dictionary, and print its size:
```python
import math
import numpy as np
import gzip
import paddle.v2 as paddle
import paddle.v2.dataset.conll05 as conll05
import paddle.v2.evaluator as evaluator
...
...
@@ -508,7 +507,7 @@ def event_handler(event):
if isinstance(event, paddle.event.EndPass):
# save parameters
with gzip.open('params_pass_%d.tar.gz' % event.pass_id, 'w') as f:
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
parameters.to_tar(f)
result = trainer.test(reader=reader, feeding=feeding)
decoder_size=512# hidden layer size of GRU in decoder
beam_size=3# expand width in beam search
max_length=250# a stop condition of sequence generation
```
```
2. Implement Encoder as follows:
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
...
...
@@ -289,8 +290,7 @@ is_generating = False
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
out += paddle.layer.full_matrix_projection(input=gru_step)
out=paddle.layer.fc(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=gru_step)
returnout
```
```
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
decoder_size = 512 # hidden layer size of GRU in decoder
beam_size = 3 # expand width in beam search
max_length = 250 # a stop condition of sequence generation
```
```
2. Implement Encoder as follows:
- Input is a sequence of words represented by an integer word index sequence. So we define data layer of data type `integer_value_sequence`. The value range of each element in the sequence is `[0, source_dict_dim)`
- Use a non-linear transformation of the last hidden state of the backward GRU on the source language sentence as the initial state of the decoder RNN $c_0=h_T$
- Define the computation in each time step for the decoder RNN, i.e., according to the current context vector $c_i$, hidden state for the decoder $z_i$ and the $i$-th word $u_i$ in the target language to predict the probability $p_{i+1}$ for the $i+1$-th word.
...
...
@@ -331,8 +332,7 @@ is_generating = False
- Softmax normalization is used in the end to computed the probability of words, i.e., $p\left ( u_i|u_{<i},\mathbf{x} \right )=softmax(W_sz_i+b_z)$. The output is returned.
out += paddle.layer.full_matrix_projection(input=gru_step)
out = paddle.layer.fc(
size=target_dict_dim,
bias_attr=True,
act=paddle.activation.Softmax(),
input=gru_step)
return out
```
```
4. Define the name for the decoder and the first two input for `gru_decoder_with_attention`. Note that `StaticInput` is used for the two inputs. Please refer to [StaticInput Document](https://github.com/PaddlePaddle/Paddle/blob/develop/doc/howto/deep_model/rnn/recurrent_group_cn.md#输入) for more details.
@@ -22,7 +22,8 @@ We packed this book, Jupyter, PaddlePaddle, and all dependencies into a Docker i
Just type
```bash
docker run -d-p 8888:8888 paddlepaddle/book:0.10.0
docker run -d-p 8888:8888 paddlepaddle/book
```
This command will download the pre-built Docker image from DockerHub.com and run it in a container. Please direct your Web browser to http://localhost:8888 to read the book.
...
...
@@ -30,7 +31,8 @@ This command will download the pre-built Docker image from DockerHub.com and run
If you are living in somewhere slow to access DockerHub.com, you might try our mirror server docker.paddlepaddle.org:
```bash
docker run -d-p 8888:8888 docker.paddlepaddle.org/book:0.10.0
docker run -d-p 8888:8888 docker.paddlepaddle.org/book
```
### Training with GPU
...
...
@@ -40,13 +42,15 @@ By default we are using CPU for training, if you want to train with GPU, the ste
To make sure GPU can be successfully used from inside container, please install [nvidia-docker](https://github.com/NVIDIA/nvidia-docker). Then run:
```bash
nvidia-docker run -d-p 8888:8888 paddlepaddle/book:0.10.0-gpu
nvidia-docker run -d-p 8888:8888 paddlepaddle/book:latest-gpu
```
Or you can use the image registry mirror in China:
```bash
nvidia-docker run -d-p 8888:8888 docker.paddlepaddle.org/book:0.10.0-gpu
nvidia-docker run -d-p 8888:8888 docker.paddlepaddle.org/book:latest-gpu
```
Change the code in the chapter that you are reading from