# Design Doc: LoD (Level-of-Detail) Tensor PaddlePaddle's RNN doesn't require that all instances have the same length. To do so, we introduce an extension to Tensor, namely, LoD Tensor. ## Challenge of Variable-length Inputs People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor. Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths. ## LoD as a Solution ### Mini-Batch of variable-length sentenses Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information: ``` 3 3 1 2 ||| | || ``` Each `|` represents a D-dimensional word vectors. The number 3 on top indicate 3 sentences, and numbers 3, 1, and 2 on the second level represent the number of words in each sentence. ### Mini-Batch of variable-length videos This approach generalizes to the case where elements are not words, but higher dimensional objects, like images. Suppose that a mini-batch contains videos of the same frame size 640x480. If a mini-batch contains 3 videos of 3, 1, and 2 frames respectively. The underlying tensor is of size (3+1+2)x640x480. The index information illustrates as: ``` 3 3 1 2 口口口 口 口口 ``` where each `口` represents an image. ### Mini-Batch of fixed-size images Let's get back to a typical example, image classification, where each mini-batch has M fixed-sized images. The LoD Tensor representation is ``` M 1 1 1 1 1 口口口口 ... 口 ``` The many 1's on the second level seem duplicated. For this particular case of 2 levels and the second level always have length 1, we can ignore the LoD index. ### Design and summarization In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor: - The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and - the first dimension size L has an additon property -- a LoD index as a nested vector: ```c++ typedef std::vector > LoD; ``` - The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1. ## Slicing of LoD Tensor Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example, ``` 3 3 1 2 3 2 4 1 2 3 ||| || |||| | || ||| ``` To allow each level of RNN to handle its input, we define **the slicing of a LoD Tensor is defined as getting the j-th sequence on level i, or the -slice** For example, the <2,1>-slice of above slice is ``` 2 || ``` and the <1,2>-slice of above example is ``` 2 2 3 || ||| ``` Let's go on slicing this slice. Its <1,1>-slice is ``` 3 ||| ``` ### The General Slicing Algorithm The algorithm, with over-simplified data structure, is defined as ```c++ typedef vector > LoD; struct LoDTensor { LoD lod_; float* tensor_; }; LoDTensor Slice(const LoDTensor& lodt, int level, int sequence) { } ``` ### Slicing the Top Level Please be aware that an RNN operator only slices the top level of a LoD Tensor to get the step inputs. ```c++ LoDTensor Slice(const LoDTensor& lodt, int sequence) { } ```