lod_tensor.md 3.8 KB
Newer Older
Y
Yi Wang 已提交
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122
# LoD (Level-of-Detail) Tensor

PaddlePaddle's RNN doesn't require that all instances have the same length.  To do so, we introduce an extension to Tensor, namely, LoD Tensor.

## Challenge of Variable-length Inputs

People usually represent a mini-batch by a Tensor.  For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor.  So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor.

Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector.  If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor.  However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths.

## LoD as a Solution

### Mini-Batch of variable-length sentenses

Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively.  We can represent it by a (3+1+2)xD tensor plus some index information:

```
   3
3   1 2
||| | ||
```

Each `|` represents a D-dimensional word vectors.  The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence.

### Mini-Batch of variable-length videos

This approach generalizes to the case where elements are not words, but higher dimensional objects, like images.  Suppose that a mini-batch contains videos of the same frame size 640x480.  If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively.  The underlying tensor is of size (3+1+2)x640x480.  The index information illustrates as:

```
     3
3     1  2
口口口 口 口口
```

where each `口` represents an image.

### Mini-Batch of fixed-size images

Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images.  The LoD Tensor representation is

```
     M
1 1 1 1     1
口口口口 ... 口
```

The many 1's on the second level seem duplicated.  For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index.

### Design and summarization

In summary, as long as that the essential elements (words  or images) have the same size, we can represent mini-batches by a LoD Tensor:

- The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and
- the first dimension size L has an additon property -- a LoD index as a nested vector:

  ```c++
  typedef std::vector<std::vector> > LoD;
  ```

- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1.

## Slicing of LoD Tensor

Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words.  This network requires that mini-batches represented by 4 level LoD Tensor, for example,

```
         3
3           1  2
3   2  4    1  2  3
||| || |||| |  || |||
```

To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the <i,j>-slice**

For example, the <2,1>-slice of above slice is

```
2
||
```

and the <1,2>-slice of above example is

```
2
2  3
|| |||
```

Let's go on slicing this slice.  Its <1,1>-slice is

```
3
|||
```

### The General Slicing Algorithm

The algorithm, with over-simplified data structure, is defined as

```c++
typedef vector<vector<int> > LoD;

struct LoDTensor {
  LoD lod_;
  float* tensor_;
};

LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) {

}
```

### Slicing the Top Level

Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs.

```c++
LoDTensor Slice(const LoDTensor& lodt, int sequence) {

}
```