@@ -4,13 +4,13 @@ PaddlePaddle's RNN doesn't require that all instances have the same length. To
...
@@ -4,13 +4,13 @@ PaddlePaddle's RNN doesn't require that all instances have the same length. To
## Challenge of Variable-length Inputs
## Challenge of Variable-length Inputs
People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.
People usually represent a mini-batch by a Tensor. For example, a mini-batch of 10 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.
Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.
Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.
## LoD as a Solution
## LoD as a Solution
### Mini-Batch of variable-length sentenses
### Mini-Batch of variable-length sentences
Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information:
Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information:
...
@@ -51,17 +51,17 @@ The many 1's on the second level seem duplicated. For this particular case of 2
...
@@ -51,17 +51,17 @@ The many 1's on the second level seem duplicated. For this particular case of 2
In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor:
In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor:
- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
-the first dimension size L has an additon property -- a LoD index as a nested vector:
-The first dimension size L has an additonal property -- a LoD index as a nested vector:
```c++
```c++
typedefstd::vector<std::vector>>LoD;
typedefstd::vector<std::vector>>LoD;
```
```
- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1.
- The LoD index is not necessary when there are only two levels and all elements of the second level have length 1.
## Slicing of LoD Tensor
## Slicing of LoD Tensor
Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example,
Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 3 level LoD Tensor, for example,
```
```
3
3
...
@@ -90,8 +90,9 @@ and the <1,2>-slice of above example is
...
@@ -90,8 +90,9 @@ and the <1,2>-slice of above example is
Let's go on slicing this slice. Its <1,1>-slice is
Let's go on slicing this slice. Its <1,1>-slice is
```
```
3
1
|||
1
|
```
```
### The Slicing Algorithm
### The Slicing Algorithm
...
@@ -128,7 +129,7 @@ Suppose that we want to retrieve the <1,2>-slice
...
@@ -128,7 +129,7 @@ Suppose that we want to retrieve the <1,2>-slice
we will need to find out the starting position of this slice by summing over all leaf nodes in `LoD` to the left of the slice, i.e., 3 + 2 + 4 + 1 = 10.
we will need to find out the starting position of this slice by summing over all leaf nodes in `LoD` to the left of the slice, i.e., 3 + 2 + 4 + 1 = 10.
To avoid the traversal of the LoD tree at slcing time, we can do it at the construction time -- instead of saving the lengths of the next level in the LoD tree, we can save the starting offset of the next level. For example, above LoD Tensor can be transformed into
To avoid the traversal of the LoD tree at slicing time, we can do it at the construction time -- instead of saving the lengths of the next level in the LoD tree, we can save the starting offset of the next level. For example, above LoD Tensor can be transformed into