diff --git a/doc/design/api.md b/doc/design/api.md index 8185d2af0ea264a2e7b4e28b9ed05279e4a22014..e6a4638d9100d9b07c3ee6b92b530a17eae1c162 100644 --- a/doc/design/api.md +++ b/doc/design/api.md @@ -3,7 +3,7 @@ ## Ingredients As our design principle is starting from the essence: how could we -allow users to express and solve their problems at neural networks. +allow users to express and solve their problems as neural networks. Some essential concepts that our API have to provide include: 1. A *topology* is an expression of *layers*. @@ -233,7 +233,7 @@ paddle.dist_train(model, num_parameter_servers=15) ``` -The pseudo code if `paddle.dist_train` is as follows: +The pseudo code of `paddle.dist_train` is as follows: ```python def dist_train(topology, parameters, trainer, reader, ...): diff --git a/doc/design/auto_gradient_check.md b/doc/design/auto_gradient_check.md index 1f4d4ec16f7c395005e610751d95c10f5f3adf52..f9991541bc51c6e13ffce4e9cec60f73dc800121 100644 --- a/doc/design/auto_gradient_check.md +++ b/doc/design/auto_gradient_check.md @@ -1,17 +1,17 @@ ## Auto Gradient Checker Design ## Backgraound: -- Operator forward computing is easy to check if the result is right because it has a clear definition. **But** backpropagation is a notoriously difficult algorithm to debug and get right: - - 1. you should get the right backpropagation formula according to the forward computation. - - 2. you should implement it right in CPP. - - 3. it's difficult to prepare test data. +- Generally, it is easy to check whether the forward computation of an Operator is correct or not. However, backpropagation is a notoriously difficult algorithm to debug and get right: + 1. you should get the right backpropagation formula according to the forward computation. + 2. you should implement it right in CPP. + 3. it's difficult to prepare test data. -- Auto gradient check gets a numeric gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages: - - 1. numeric gradient checker only need forward operator. - - 2. user only need to prepare the input data for forward Operator. +- Auto gradient checking gets a numerical gradient by forward Operator and use it as a reference of the backward Operator's result. It has several advantages: + 1. numerical gradient checker only need forward operator. + 2. user only need to prepare the input data for forward Operator. ## Mathematical Theory -The following two document from stanford has a detailed explanation of how to get numeric gradient and why it's useful. +The following two document from Stanford has a detailed explanation of how to get numerical gradient and why it's useful. - [Gradient checking and advanced optimization(en)](http://deeplearning.stanford.edu/wiki/index.php/Gradient_checking_and_advanced_optimization) - [Gradient checking and advanced optimization(cn)](http://ufldl.stanford.edu/wiki/index.php/%E6%A2%AF%E5%BA%A6%E6%A3%80%E9%AA%8C%E4%B8%8E%E9%AB%98%E7%BA%A7%E4%BC%98%E5%8C%96) @@ -20,7 +20,7 @@ The following two document from stanford has a detailed explanation of how to ge ## Numeric Gradient Implementation ### Python Interface ```python -def get_numeric_gradient(op, +def get_numerical_gradient(op, input_values, output_name, input_to_check, @@ -30,13 +30,13 @@ def get_numeric_gradient(op, Get Numeric Gradient for an operator's input. :param op: C++ operator instance, could be an network - :param input_values: The input variables. Should be an dictionary, key is - variable name. Value is numpy array. + :param input_values: The input variables. Should be an dictionary, whose key is + variable name, and value is numpy array. :param output_name: The final output variable name. - :param input_to_check: The input variable need to get gradient. + :param input_to_check: The input variable with respect to which to compute the gradient. :param delta: The perturbation value for numeric gradient method. The smaller delta is, the more accurate result will get. But if that delta is - too small, it could occur numerical stability problem. + too small, it will suffer from numerical stability problem. :param local_scope: The local scope used for get_numeric_gradient. :return: The gradient array in numpy format. """ @@ -45,28 +45,28 @@ def get_numeric_gradient(op, ### Explaination: - Why need `output_name` - - One Operator may have multiple Output, you can get independent gradient from each Output. So user should set one output to calculate. + - An Operator may have multiple Output, one can get independent gradient from each Output. So caller should specify the name of the output variable. - Why need `input_to_check` - - One operator may have multiple inputs. Gradient Op can calculate the gradient of these Inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times. + - One operator may have multiple inputs. Gradient Op can calculate the gradient of these inputs at the same time. But Numeric Gradient needs to calculate them one by one. So `get_numeric_gradient` is designed to calculate the gradient for one input. If you need to compute multiple inputs, you can call `get_numeric_gradient` multiple times. ### Core Algorithm Implementation ```python - # we only compute gradient of one element each time. - # we use a for loop to compute the gradient of every element. + # we only compute gradient of one element a time. + # we use a for loop to compute the gradient of each element. for i in xrange(tensor_size): - # get one input element throw it's index i. + # get one input element by its index i. origin = tensor_to_check.get_float_element(i) - # add delta to it, run op and then get the sum of the result tensor. + # add delta to it, run op and then get the new value of the result tensor. x_pos = origin + delta tensor_to_check.set_float_element(i, x_pos) y_pos = get_output() - # plus delta to this element, run op and get the sum of the result tensor. + # plus delta to this element, run op and get the new value of the result tensor. x_neg = origin - delta tensor_to_check.set_float_element(i, x_neg) y_neg = get_output() @@ -85,15 +85,15 @@ def get_numeric_gradient(op, Each Operator Kernel has three kinds of Gradient: -- 1. Numeric Gradient -- 2. CPU Operator Gradient -- 3. GPU Operator Gradient(if supported) +1. Numerical gradient +2. CPU kernel gradient +3. GPU kernel gradient (if supported) -Numeric Gradient Only relies on forward Operator. So we use Numeric Gradient as the reference value. +The numerical gradient only relies on forward Operator. So we use the numerical gradient as the reference value. And the gradient checking is performed in the following three steps: -- 1. calculate the numeric gradient. -- 2. calculate CPU kernel Gradient with the backward Operator and compare it with the numeric gradient. -- 3. calculate GPU kernel Gradient with the backward Operator and compare it with the numeric gradient.(if support GPU) +1. calculate the numerical gradient +2. calculate CPU kernel gradient with the backward Operator and compare it with the numerical gradient +3. calculate GPU kernel gradient with the backward Operator and compare it with the numeric gradient (if supported) #### Python Interface @@ -110,8 +110,8 @@ Numeric Gradient Only relies on forward Operator. So we use Numeric Gradient as :param forward_op: used to create backward_op :param input_vars: numpy value of input variable. The following computation will use these variables. - :param inputs_to_check: inputs var names that should check gradient. - :param output_name: output name that used to + :param inputs_to_check: the input variable with respect to which to compute the gradient. + :param output_name: The final output variable name. :param max_relative_error: The relative tolerance parameter. :param no_grad_set: used when create backward ops :param only_cpu: only compute and check gradient on cpu kernel. @@ -120,24 +120,24 @@ Numeric Gradient Only relies on forward Operator. So we use Numeric Gradient as ``` ### How to check if two numpy array is close enough? -if `abs_numeric_grad` is nearly zero, then use abs error for numeric_grad, not relative +if `abs_numerical_grad` is nearly zero, then use abs error for numerical_grad ```python -numeric_grad = ... +numerical_grad = ... operator_grad = numpy.array(scope.find_var(grad_var_name(name)).get_tensor()) -abs_numeric_grad = numpy.abs(numeric_grad) -# if abs_numeric_grad is nearly zero, then use abs error for numeric_grad, not relative +abs_numerical_grad = numpy.abs(numerical_grad) +# if abs_numerical_grad is nearly zero, then use abs error for numeric_grad, not relative # error. -abs_numeric_grad[abs_numeric_grad < 1e-3] = 1 +abs_numerical_grad[abs_numerical_grad < 1e-3] = 1 -diff_mat = numpy.abs(abs_numeric_grad - operator_grad) / abs_numeric_grad +diff_mat = numpy.abs(abs_numerical_grad - operator_grad) / abs_numerical_grad max_diff = numpy.max(diff_mat) ``` #### Notes: -1,The Input data for auto gradient checker should be reasonable to avoid numeric problem. +The Input data for auto gradient checker should be reasonable to avoid numerical stability problem. #### Refs: diff --git a/doc/design/functions_operators_layers.md b/doc/design/functions_operators_layers.md index d23ba56b5773a36d448a99e4abdebc1475ed789c..984b59f4c6971dfb6f46dfe342f2751f392c0e88 100644 --- a/doc/design/functions_operators_layers.md +++ b/doc/design/functions_operators_layers.md @@ -53,12 +53,12 @@ Let's explain using an example. Suppose that we are going to compose the FC usi ```python def operator.mul(X1, X2): O = Var() - paddle.cpp.create_operator("mul", input={X1, Y1], output=O) + paddle.cpp.create_operator("mul", input={X1, Y1}, output=O) return O def operator.add(X1, X2): O = Var() - paddle.cpp.create_operator("add", input={X1, X2], output=O) + paddle.cpp.create_operator("add", input={X1, X2}, output=O) return O ``` diff --git a/doc/design/graph.md b/doc/design/graph.md index 51b7f87638f8ddff752328a562fe0dd0fe56cfd1..7519a65df835a39fe14f6ef45530afff170191ff 100644 --- a/doc/design/graph.md +++ b/doc/design/graph.md @@ -56,7 +56,7 @@ For each parameter, like W and b created by `layer.fc`, marked as double circles ## Block and Graph -The word block and graph are interchangable in the desgin of PaddlePaddle. A [Block[(https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block. +The word block and graph are interchangable in the desgin of PaddlePaddle. A [Block](https://github.com/PaddlePaddle/Paddle/pull/3708) is a metaphore of the code and local variables in a pair of curly braces in programming languages, where operators are like statements or instructions. A graph of operators and variables is a representation of the block. A Block keeps operators in an array `BlockDesc::ops` @@ -67,4 +67,4 @@ message BlockDesc { } ``` -in the order that there appear in user programs, like the Python program at the beginning of this article. We can imagine that in `ops`, we have some forward operators, followed by some gradient operators, and then some optimization operators. +in the order that they appear in user programs, like the Python program at the beginning of this article. We can imagine that in `ops`, we have some forward operators, followed by some gradient operators, and then some optimization operators. diff --git a/doc/design/parameters_in_cpp.md b/doc/design/parameters_in_cpp.md index b6f99bc7d9d6fafacb0a4bcff806b65d9aef98cc..a7ac3f17c44ca94a669a8f1e283b291bceb42317 100644 --- a/doc/design/parameters_in_cpp.md +++ b/doc/design/parameters_in_cpp.md @@ -1,19 +1,19 @@ # Design Doc: The C++ Class `Parameters` -`Parameters` is a concept we designed in Paddle V2 API. `Parameters` is a container of parameters, and make Paddle can shared parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). +`Parameters` is a concept we designed in PaddlePaddle V2 API. `Parameters` is a container of parameters, which makes PaddlePaddle capable of sharing parameter between topologies. We described usages of `Parameter` in [api.md](./api.md). -We used Python to implement Parameters when designing V2 API before. There are several defects for current implementation: +We used Python to implement Parameters when designing V2 API before. There are several defects for the current implementation: * We just use `memcpy` to share Parameters between topologies, but this is very inefficient. -* We did not implement share Parameters while training. We just trigger `memcpy` when start training. +* We did not support sharing Parameters while training. We just trigger `memcpy` when start training. -It is necessary that we implement Parameters in CPP side. However, it could be a code refactoring for Paddle, because Paddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current Paddle implementation, there are three concepts associated with `Parameters`: +It is necessary that we implement Parameters in CPP side. However, it could result a code refactoring for PaddlePaddle, because PaddlePaddle was designed for training only one topology before, i.e., each GradientMachine contains its Parameter as a data member. In current PaddlePaddle implementation, there are three concepts associated with `Parameters`: 1. `paddle::Parameter`. A `Parameters` is a container for `paddle::Parameter`. It is evident that we should use `paddle::Parameter` when developing `Parameters`. However, the `Parameter` class contains many functions and does not have a clear interface. It contains `create/store Parameter`, `serialize/deserialize`, `optimize(i.e SGD)`, `randomize/zero`. When we developing `Parameters`, we only use `create/store Parameter` functionality. -We should extract functionalities of Parameter into many classes to clean Paddle CPP implementation. +We should extract functionalities of Parameter into many classes to clean PaddlePaddle CPP implementation. 2. `paddle::GradientMachine` and its sub-classes, e.g., `paddle::MultiGradientMachine`, `paddle::NeuralNetwork`. We should pass `Parameters` to `paddle::GradientMachine` when `forward/backward` to avoid `memcpy` between topologies. @@ -24,7 +24,7 @@ Also, we should handle multi-GPU/CPU training, because `forward` and `backward` So `Parameters` should be used by `paddle::ParameterUpdater`, and `paddle::ParameterUpdater` should optimize `Parameters` (by SGD). -The step by step approach for implementation Parameters in Paddle C++ core is listed below. Each step should be a PR and could be merged into Paddle one by one. +The step by step approach for implementation Parameters in PaddlePaddle C++ core is listed below. Each step should be a PR and could be merged into PaddlePaddle one by one. 1. Clean `paddle::Parameter` interface. Extract the functionalities of `paddle::Parameter` to prepare for the implementation of Parameters. diff --git a/doc/design/reader/README.md b/doc/design/reader/README.md index f21f7af520df5171798326818ecb97c3bcd14a12..320dccec3ddc7bfe6042f4e65b2518ea7b1ad24a 100644 --- a/doc/design/reader/README.md +++ b/doc/design/reader/README.md @@ -52,7 +52,7 @@ Here are valid outputs: # a mini batch of three data items, each data item is a list (single column). [([1,1,1],), ([2,2,2],), -([3,3,3],), +([3,3,3],)] ``` Please note that each item inside the list must be a tuple, below is an invalid output: diff --git a/doc/design/refactorization.md b/doc/design/refactorization.md index e105861e926411a269b0b52dd4688744912c9ab3..ad801ca421ca31c84b0a6b0a18d1d625c87e0de5 100644 --- a/doc/design/refactorization.md +++ b/doc/design/refactorization.md @@ -15,7 +15,7 @@ The goal of refactorizaiton include: 1. Users write Python programs to describe the graphs and run it (locally or remotely). -1. A graph is composed of *variabels* and *operators*. +1. A graph is composed of *variables* and *operators*. 1. The description of graphs must be able to be serialized/deserialized, so it @@ -140,7 +140,7 @@ Compile Time -> IR -> Runtime * `thrust` has the same API as C++ standard library. Using `transform` can quickly implement a customized elementwise kernel. * `thrust` has more complex API, like `scan`, `reduce`, `reduce_by_key`. * Hand-writing `GPUKernel` and `CPU` code - * Do not write `.h`. CPU Kernel should be in `.cc`. CPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) + * Do not write `.h`. CPU Kernel should be in `.cc`. GPU kernel should be in `.cu`. (`GCC` cannot compile GPU code.) --- # Operator Register diff --git a/doc/design/releasing_process.md b/doc/design/releasing_process.md index 0c10e782808ca6456347ec54cb5e921162731ede..62ff8f3229bbbb5bc82e4da29259baffc30c2c87 100644 --- a/doc/design/releasing_process.md +++ b/doc/design/releasing_process.md @@ -1,8 +1,8 @@ -# Paddle发行规范 +# PaddlePaddle发行规范 -Paddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示Paddle版本号。 +PaddlePaddle使用git-flow branching model做分支管理,使用[Semantic Versioning](http://semver.org/)标准表示PaddlePaddle版本号。 -Paddle每次发新的版本,遵循以下流程: +PaddlePaddle每次发新的版本,遵循以下流程: 1. 从`develop`分支派生出新的分支,分支名为`release/版本号`。例如,`release/0.10.0` 2. 将新分支的版本打上tag,tag为`版本号rc.Patch号`。第一个tag为`0.10.0rc1`,第二个为`0.10.0rc2`,依次类推。 @@ -27,14 +27,14 @@ Paddle每次发新的版本,遵循以下流程: 需要注意的是: -* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试Paddle的行为。 +* `release/版本号`分支一旦建立,一般不允许再从`develop`分支合入`release/版本号`。这样保证`release/版本号`分支功能的封闭,方便测试人员测试PaddlePaddle的行为。 * 在`release/版本号`分支存在的时候,如果有bugfix的行为,需要将bugfix的分支同时merge到`master`, `develop`和`release/版本号`这三个分支。 -# Paddle 分支规范 +# PaddlePaddle 分支规范 -Paddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 +PaddlePaddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,并适应github的特性做了一些区别。 -* Paddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中: +* PaddlePaddle的主版本库遵循[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范。其中: * `master`分支为稳定(stable branch)版本分支。每一个`master`分支的版本都是经过单元测试和回归测试的版本。 * `develop`分支为开发(develop branch)版本分支。每一个`develop`分支的版本都经过单元测试,但并没有经过回归测试。 * `release/版本号`分支为每一次Release时建立的临时分支。在这个阶段的代码正在经历回归测试。 @@ -42,18 +42,18 @@ Paddle开发过程使用[git-flow](http://nvie.com/posts/a-successful-git-branch * 其他用户的fork版本库并不需要严格遵守[git-flow](http://nvie.com/posts/a-successful-git-branching-model/)分支规范,但所有fork的版本库的所有分支都相当于特性分支。 * 建议,开发者fork的版本库使用`develop`分支同步主版本库的`develop`分支 * 建议,开发者fork的版本库中,再基于`develop`版本fork出自己的功能分支。 - * 当功能分支开发完毕后,向Paddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 + * 当功能分支开发完毕后,向PaddlePaddle的主版本库提交`Pull Reuqest`,进而进行代码评审。 * 在评审过程中,开发者修改自己的代码,可以继续在自己的功能分支提交代码。 * BugFix分支也是在开发者自己的fork版本库维护,与功能分支不同的是,BugFix分支需要分别给主版本库的`master`、`develop`与可能有的`release/版本号`分支,同时提起`Pull Request`。 -# Paddle回归测试列表 +# PaddlePaddle回归测试列表 -本列表说明Paddle发版之前需要测试的功能点。 +本列表说明PaddlePaddle发版之前需要测试的功能点。 -## Paddle Book中所有章节 +## PaddlePaddle Book中所有章节 -Paddle每次发版本首先要保证Paddle Book中所有章节功能的正确性。功能的正确性包括验证Paddle目前的`paddle_trainer`训练和纯使用`Python`训练模型正确性。 +PaddlePaddle每次发版本首先要保证PaddlePaddle Book中所有章节功能的正确性。功能的正确性包括验证PaddlePaddle目前的`paddle_trainer`训练和纯使用`Python`训练模型正确性。 | | 新手入门章节 | 识别数字 | 图像分类 | 词向量 | 情感分析 | 语意角色标注 | 机器翻译 | 个性化推荐 | | --- | --- | --- | --- | --- | --- | --- | --- | --- | diff --git a/doc/design/scope.md b/doc/design/scope.md index c9e0be716b606f6c7bf0373e0c6e632647e07a6f..b1f9bb4378eb5ec6926f1e53f7c1f4fd5674064c 100644 --- a/doc/design/scope.md +++ b/doc/design/scope.md @@ -17,7 +17,7 @@ Scope is an association of a name to variable. All variables belong to `Scope`. 1. Scope only contains a map of a name to variable. - All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state(momentum) etc. + All parameters, data, states in a Net should be variables and stored inside a scope. Each op should get inputs and outputs to do computation from a scope, such as data buffer, state (momentum) etc. 1. Variable can only be created by Scope and a variable can only be got from Scope. User cannot create or get a variable outside a scope. This is a constraints of our framework, and will keep our framework simple and clear. @@ -32,7 +32,7 @@ Scope is an association of a name to variable. All variables belong to `Scope`. 1. Scope should destruct all Variables inside it when itself is destructed. User can never store `Variable` pointer somewhere else. - Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store `Variable` pointer to private data member or some global variable, the pointer will be a invalid pointer when associated `Scope` is destroyed. + Because Variable can only be got from Scope. When destroying Scope, we also need to destroy all the Variables in it. If user store `Variable` pointer to private data member or some global variable, the pointer will be an invalid pointer when associated `Scope` is destroyed. ```cpp class Scope { @@ -50,7 +50,7 @@ class Scope { Just like [scope](https://en.wikipedia.org/wiki/Scope_(computer_science)) in programming languages, `Scope` in the neural network can also be a local scope. There are two attributes about local scope. -1. We can create local variables in a local scope. When that local scope are destroyed, all local variables should also be destroyed. +1. We can create local variables in a local scope. When that local scope is destroyed, all local variables should also be destroyed. 2. Variables in a parent scope can be retrieved from local scopes of that parent scope, i.e., when user get a variable from a scope, it will try to search this variable in current scope. If there is no such variable in the local scope, `scope` will keep searching from its parent, until the variable is found or there is no parent. ```cpp @@ -121,4 +121,4 @@ Also, as the parent scope is a `shared_ptr`, we can only `Create()` a scope shar ## Orthogonal interface -`FindVar` will return `nullptr` when `name` is not found. It can be used as `Contains` method. `NewVar` will return a `Error` when there is a name conflict locally. Combine `FindVar` and `NewVar`, we can implement `NewVar` easily. +`FindVar` will return `nullptr` when `name` is not found. It can be used as `Contains` method. `NewVar` will return an `Error` when there is a name conflict locally. Combine `FindVar` and `NewVar`, we can implement `NewVar` easily. diff --git a/doc/design/simple_op_design.md b/doc/design/simple_op_design.md index fded4a68612396a262121a5a886a8ae573dfa662..c7aeed7f9b4637e1c29d530f37b42d12500af82f 100644 --- a/doc/design/simple_op_design.md +++ b/doc/design/simple_op_design.md @@ -6,9 +6,9 @@ The Interaction between Python and C++ can be simplified as two steps: 1. C++ tells Python how many Ops there are, and what parameter do users need to offer to initialize a new Op. Python then builds API for each Op at compile time. -2. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ fo finish Op construction task. +2. Users invoke APIs built by Python and provide necessary parameters. These parameters will be sent to C++ for finishing the Op construction task. -### Message form C++ to Python +### Message from C++ to Python We define a Protobuf message class `OpProto` to hold message needed in the first step. What should an `OpProto` contain? This question is equivalent to “What message do we need to offer, to build a Python API which is legal and user oriented and can use to describe a whole Op.” @@ -193,7 +193,7 @@ def fc_layer(input, size, with_bias, activation): elif: # ... return act_output; -``` +``` ### Low Leval API diff --git a/doc/design/var_desc.md b/doc/design/var_desc.md index 86a95c10d5729704f86c285c9fe92db0cf2158be..bfbbdd0578ebc69ea4b49ade9b041573a9e9ad55 100644 --- a/doc/design/var_desc.md +++ b/doc/design/var_desc.md @@ -1,7 +1,7 @@ ## Background PaddlePaddle divides the description of neural network computation graph into two stages: compile time and runtime. -PaddlePaddle use proto message to describe compile time graph for +PaddlePaddle use proto message to describe compile time graph because 1. Computation graph should be able to be saved to a file. 1. In distributed training, the graph will be serialized and send to multiple workers. diff --git a/paddle/framework/lod_tensor.md b/paddle/framework/lod_tensor.md index 39b27d9b9583fe520a5d5d1254a1ea04e5f59ce2..07bbdf9416c432052b3222757a61ac4bfd70fe14 100644 --- a/paddle/framework/lod_tensor.md +++ b/paddle/framework/lod_tensor.md @@ -4,7 +4,7 @@ PaddlePaddle's RNN doesn't require that all instances have the same length. To ## Challenge of Variable-length Inputs -People usually represent a mini-batch by a Tensor. For example, a mini-batch of 10 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 32x32xO-dimensional tensor T and the 10x32x32 Tensor. +People usually represent a mini-batch by a Tensor. For example, a mini-batch of 10 images, each of size 32x32, is a 10x32x32 Tensor. So a transformation, T, of all images can be a matrix multiplication of the 10xOx32-dimensional tensor T and the 10x32x32 Tensor. Another example is that each mini-batch contains 32 sentences, where each word is a D-dimensional one-hot vector. If all sentences have the same length L, we can represent this mini-batch by a 32xLxD tensor. However, in most cases, sentences have variable lengths, and we will need an index data structure to record these variable lengths. @@ -54,7 +54,7 @@ In summary, as long as that the essential elements (words or images) have the s - The first dimension size L has an additonal property -- a LoD index as a nested vector: ```c++ - typedef std::vector > LoD; + typedef std::vector> LoD; ``` - The LoD index is not necessary when there are only two levels and all elements of the second level have length 1. @@ -100,7 +100,7 @@ Let's go on slicing this slice. Its <1,1>-slice is The algorithm, with over-simplified data structure, is defined as ```c++ -typedef vector > LoD; +typedef std::vector> LoD; struct LoDTensor { LoD lod_;