diff --git a/paddle/framework/lod_tensor.md b/paddle/framework/lod_tensor.md index 769b61f175a2f462258c1242d027c04c0abd12a9..39b27d9b9583fe520a5d5d1254a1ea04e5f59ce2 100644 --- a/paddle/framework/lod_tensor.md +++ b/paddle/framework/lod_tensor.md @@ -4,13 +4,13 @@ PaddlePaddle's RNN doesn't require that all instances have the same length. To ## Challenge of Variable-length Inputs -People usually represent a mini-batch by a Tensor. For example, a mini-batch of 32 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor. +People usually represent a mini-batch by a Tensor. For example, a mini-batch of 10 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor. Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths. ## LoD as a Solution -### Mini-Batch of variable-length sentenses +### Mini-Batch of variable-length sentences Let's imagine a mini-batch of 3 variable lengths sentences, containing 3, 1, and 2 words respectively. We can represent it by a (3+1+2)xD tensor plus some index information: @@ -51,17 +51,17 @@ The many 1's on the second level seem duplicated. For this particular case of 2 In summary, as long as that the essential elements (words or images) have the same size, we can represent mini-batches by a LoD Tensor: - The underlying tensor has size LxD1xD2x..., where D1xD2... is the size of the essential elements, and -- the first dimension size L has an additon property -- a LoD index as a nested vector: +- The first dimension size L has an additonal property -- a LoD index as a nested vector: ```c++ typedef std::vector > LoD; ``` -- The LoD index can is not necessary when there are only two levels and all elements of the second level have length 1. +- The LoD index is not necessary when there are only two levels and all elements of the second level have length 1. ## Slicing of LoD Tensor -Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 4 level LoD Tensor, for example, +Consider that we have a network with three levels of RNN: the top level one handles articles, the second level one handles sentences, and the basic level one handles words. This network requires that mini-batches represented by 3 level LoD Tensor, for example, ``` 3 @@ -90,8 +90,9 @@ and the <1,2>-slice of above example is Let's go on slicing this slice. Its <1,1>-slice is ``` -3 -||| +1 +1 +| ``` ### The Slicing Algorithm @@ -128,7 +129,7 @@ Suppose that we want to retrieve the <1,2>-slice we will need to find out the starting position of this slice by summing over all leaf nodes in `LoD` to the left of the slice, i.e., 3 + 2 + 4 + 1 = 10. -To avoid the traversal of the LoD tree at slcing time, we can do it at the construction time -- instead of saving the lengths of the next level in the LoD tree, we can save the starting offset of the next level. For example, above LoD Tensor can be transformed into +To avoid the traversal of the LoD tree at slicing time, we can do it at the construction time -- instead of saving the lengths of the next level in the LoD tree, we can save the starting offset of the next level. For example, above LoD Tensor can be transformed into ``` 0